Help for package tidyEmoji

Type:

Package

Title:

Discover, Count and Score Emoji in Text

Version:

0.2.0

Description:

A tidy toolkit for working with the emoji in any text column, such as social-media posts, product reviews, chat logs or survey responses. Unicode is awkward to handle and not every code point is an emoji, which makes emoji statistics fiddly to obtain. 'tidyEmoji' extracts, counts, categorises and sentiment-scores emoji with grapheme-aware detection (so skin-tone and multi-person sequences stay intact), returning tidy data frames that slot straight into a 'tidyverse' workflow. The bundled emoji sentiment lexicon is from the Emoji Sentiment Ranking of Kralj Novak et al. (2015) <doi:10.1371/journal.pone.0144296>, released under CC BY-SA 4.0.

License:

GPL (≥ 3)

URL:

https://pursuitofdatascience.github.io/tidyEmoji/

BugReports:

https://github.com/PursuitOfDataScience/tidyEmoji/issues

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.2

Depends:

R (≥ 3.5.0)

Imports:

dplyr (≥ 1.1.0), emoji, lifecycle, stats, tibble, tidyr, utils

Suggests:

rmarkdown, knitr, testthat (≥ 3.0.0), ggplot2, readr, forcats, stringr

Config/testthat/edition:

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2026-06-16 21:03:52 UTC; youzhi

Author:

Youzhi Yu [aut, cre]

Maintainer:

Youzhi Yu <yuyouzhi666@icloud.com>

Repository:

CRAN

Date/Publication:

2026-06-17 07:20:02 UTC

tidyEmoji: Discover, Count and Score Emoji in Text

Description

A tidy toolkit for working with the emoji in any text column, such as social-media posts, product reviews, chat logs or survey responses. Unicode is awkward to handle and not every code point is an emoji, which makes emoji statistics fiddly to obtain. 'tidyEmoji' extracts, counts, categorises and sentiment-scores emoji with grapheme-aware detection (so skin-tone and multi-person sequences stay intact), returning tidy data frames that slot straight into a 'tidyverse' workflow. The bundled emoji sentiment lexicon is from the Emoji Sentiment Ranking of Kralj Novak et al. (2015) doi:10.1371/journal.pone.0144296, released under CC BY-SA 4.0.

Author(s)

Maintainer: Youzhi Yu yuyouzhi666@icloud.com

Emoji category to unicode crosswalk

Description

A table with one row per Unicode category, listing every emoji glyph in that category as a single |-separated string.

Usage

category_unicode_crosswalk

Format

A data frame with two columns:

category: The Unicode category (10 categories).
unicodes: The emoji glyphs in the category, separated by |.

Source

Derived from the emojis table of the emoji package; rebuilt by data-raw/crosswalks.R.

Categorise each row by the emoji categories it contains

Description

emoji_categorize() keeps the rows of data that contain emoji and adds a .emoji_category column listing the distinct Unicode categories present in that row (for example "Smileys & Emotion"), separated by | when a row spans more than one category.

Usage

emoji_categorize(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

data, as a tibble, filtered to the rows containing emoji and with an added .emoji_category column.

Examples

df <- data.frame(text = c("smile \U0001f600",
                          "flag \U0001f3c1\U0001f600",
                          "nothing"))
emoji_categorize(df, text)

Add a list-column of the emoji found in each row

Description

emoji_extract_nest() returns data unchanged except for an added list-column, .emoji_unicode, holding the emoji found in each row. Detection is grapheme-aware, so skin-tone modifiers and ZWJ sequences (for example family emoji) are kept intact as a single emoji.

Usage

emoji_extract_nest(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

data with an added list-column .emoji_unicode.

Examples

df <- data.frame(text = c("hi \U0001f600\U0001f603", "none"))
emoji_extract_nest(df, text)

Emoji counts per row, in long (tidy) form

Description

emoji_extract_unnest() returns one row per (row, emoji) pair with a count, dropping rows that contain no emoji. row_number refers to the position of the entry in data.

Usage

emoji_extract_unnest(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

A tibble with columns row_number, .emoji_unicode and .emoji_count.

Examples

df <- data.frame(text = c("hi \U0001f600\U0001f600", "none", "\U0001f44b"))
emoji_extract_unnest(df, text)

Keep only the rows whose text contains emoji

Description

emoji_filter() returns the rows of data whose text column contains at least one emoji, preserving every original column. emoji_tweets() is a synonym retained for backward compatibility.

Usage

emoji_filter(data, text)

emoji_tweets(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

A tibble containing only the rows with at least one emoji.

Examples

df <- data.frame(text = c("hi \U0001f600", "no emoji", "bye \U0001f44b"))
emoji_filter(df, text)

Frequency of every emoji in a text column

Description

emoji_frequency() counts how often each emoji appears across the whole text column (an entry containing the same emoji twice contributes 2) and returns a tibble sorted by descending count, with each emoji's name, shortcode and category.

Usage

emoji_frequency(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

A tibble with columns emoji, name, shortcode, group and n.

Examples

df <- data.frame(text = c("\U0001f600\U0001f600", "\U0001f621"))
emoji_frequency(df, text)

Score the sentiment of the emoji in each row

Description

emoji_sentiment() adds the mean emoji sentiment of each row, based on the Emoji Sentiment Ranking lexicon (see emoji_sentiment_lexicon). Scores range from -1 (negative) through 0 (neutral) to +1 (positive). Rows that contain no emoji, or whose emoji are absent from the lexicon, receive NA.

Usage

emoji_sentiment(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

data, as a tibble, with added columns .emoji_n (the number of emoji in the row) and .emoji_sentiment (the mean sentiment of the emoji that appear in the lexicon).

References

Kralj Novak P, Smailovic J, Sluban B, Mozetic I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296

Examples

df <- data.frame(text = c("love it \U0001f60d", "awful \U0001f621", "meh"))
emoji_sentiment(df, text)

Emoji Sentiment Ranking lexicon

Description

Sentiment scores for emoji, from the Emoji Sentiment Ranking 1.0, computed from ~70,000 tweets in 13 European languages annotated for sentiment. The sentiment_score is (positive - negative) / occurrences, ranging from -1 (negative) to +1 (positive); sentiment_label is derived from its sign.

Usage

emoji_sentiment_lexicon

Format

A data frame with one row per emoji and the columns:

emoji: The emoji glyph.
occurrences: Number of times the emoji was observed.
position: Mean position of the emoji within its text (0-1).
negative, neutral, positive: Annotation counts for each class.
sentiment_score: Sentiment score from -1 to 1.
sentiment_label: "negative", "neutral" or "positive".
unicode_name: The official Unicode character name.
unicode_block: The Unicode block.

Source

Kralj Novak P, Smailovic J, Sluban B, Mozetic I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296. Data from https://hdl.handle.net/11356/1048, released under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) licence. Processed by data-raw/emoji_sentiment_lexicon.R.

Summarise emoji presence in a text column

Description

emoji_summary() reports how many entries in a text column contain at least one emoji, alongside the total number of entries. An entry is counted once regardless of how many emoji it holds.

Usage

emoji_summary(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

A one-row tibble with columns emoji_tweets (entries containing at least one emoji) and total_tweets (all entries).

Examples

df <- data.frame(text = c("I love R \U0001f600",
                          "no emoji here",
                          "flags \U0001f3c1\U0001f600"))
emoji_summary(df, text)

Tidy emoji tokens, one row per occurrence with metadata

Description

emoji_tokens() expands data to one row per emoji occurrence (in reading order), keeping the original columns and adding the glyph together with its name, category and sentiment score. This mirrors the one-token-per-row shape familiar from tidy text mining and is convenient for counting, joining and plotting.

Usage

emoji_tokens(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

A tibble with the original columns plus .emoji, .emoji_name, .emoji_category and .emoji_sentiment. Rows without emoji are dropped.

Examples

df <- data.frame(id = 1:2, text = c("great \U0001f600", "bad \U0001f621"))
emoji_tokens(df, text)

Emoji name, unicode and category crosswalk

Description

A table with one row per emoji name: each emoji glyph appears once for every GitHub-style name it is known by, so a single unicode can occur on several rows (for example the grinning face is both "grinning" and "grinning_face").

Usage

emoji_unicode_crosswalk

Format

A data frame with three columns:

emoji_name: The emoji name / shortcode (e.g. "grinning").
unicode: The emoji glyph.
emoji_category: The Unicode category the emoji belongs to.

Source

Derived from the emojis table of the emoji package; rebuilt by data-raw/crosswalks.R.

The most frequent emoji in a text column

Description

top_n_emojis() returns the n most frequent emoji. By default each emoji (unicode) appears on a single row; set duplicated = TRUE to list every name an emoji is known by, so glyphs that share several names occupy several rows.

Usage

top_n_emojis(
  data,
  text,
  n = 20,
  duplicated = FALSE,
  duplicated_unicode = lifecycle::deprecated()
)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

n

Number of emoji to return. Default 20.

duplicated

If TRUE, emoji with several names occupy several rows. Default FALSE.

duplicated_unicode

Use duplicated instead.

Value

A tibble with columns emoji_name, unicode, emoji_category and n.

Examples

df <- data.frame(text = c("\U0001f600\U0001f600\U0001f3c1", "\U0001f621"))
top_n_emojis(df, text, n = 2)

Package {tidyEmoji}

tidyEmoji: Discover, Count and Score Emoji in Text

Description

Author(s)

See Also

Emoji category to unicode crosswalk

Description

Usage

Format

Source

Categorise each row by the emoji categories it contains

Description

Usage

Arguments

Value

Examples

Add a list-column of the emoji found in each row

Description

Usage

Arguments

Value

See Also

Examples

Emoji counts per row, in long (tidy) form

Description

Usage

Arguments

Value

Examples

Keep only the rows whose text contains emoji

Description

Usage

Arguments

Value

Examples

Frequency of every emoji in a text column

Description

Usage

Arguments

Value

See Also

Examples

Score the sentiment of the emoji in each row

Description

Usage

Arguments

Value

References

See Also

Examples

Emoji Sentiment Ranking lexicon

Description

Usage

Format

Source

Summarise emoji presence in a text column

Description

Usage

Arguments

Value

See Also

Examples

Tidy emoji tokens, one row per occurrence with metadata

Description

Usage

Arguments

Value

See Also

Examples

Emoji name, unicode and category crosswalk

Description

Usage

Format

Source

The most frequent emoji in a text column

Description

Usage

Arguments

Value

See Also