Package {tidyEmoji}


Type: Package
Title: Discover, Count and Score Emoji in Text
Version: 0.2.0
Description: A tidy toolkit for working with the emoji in any text column, such as social-media posts, product reviews, chat logs or survey responses. Unicode is awkward to handle and not every code point is an emoji, which makes emoji statistics fiddly to obtain. 'tidyEmoji' extracts, counts, categorises and sentiment-scores emoji with grapheme-aware detection (so skin-tone and multi-person sequences stay intact), returning tidy data frames that slot straight into a 'tidyverse' workflow. The bundled emoji sentiment lexicon is from the Emoji Sentiment Ranking of Kralj Novak et al. (2015) <doi:10.1371/journal.pone.0144296>, released under CC BY-SA 4.0.
License: GPL (≥ 3)
URL: https://pursuitofdatascience.github.io/tidyEmoji/
BugReports: https://github.com/PursuitOfDataScience/tidyEmoji/issues
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2
Depends: R (≥ 3.5.0)
Imports: dplyr (≥ 1.1.0), emoji, lifecycle, stats, tibble, tidyr, utils
Suggests: rmarkdown, knitr, testthat (≥ 3.0.0), ggplot2, readr, forcats, stringr
Config/testthat/edition: 3
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2026-06-16 21:03:52 UTC; youzhi
Author: Youzhi Yu [aut, cre]
Maintainer: Youzhi Yu <yuyouzhi666@icloud.com>
Repository: CRAN
Date/Publication: 2026-06-17 07:20:02 UTC

tidyEmoji: Discover, Count and Score Emoji in Text

Description

A tidy toolkit for working with the emoji in any text column, such as social-media posts, product reviews, chat logs or survey responses. Unicode is awkward to handle and not every code point is an emoji, which makes emoji statistics fiddly to obtain. 'tidyEmoji' extracts, counts, categorises and sentiment-scores emoji with grapheme-aware detection (so skin-tone and multi-person sequences stay intact), returning tidy data frames that slot straight into a 'tidyverse' workflow. The bundled emoji sentiment lexicon is from the Emoji Sentiment Ranking of Kralj Novak et al. (2015) doi:10.1371/journal.pone.0144296, released under CC BY-SA 4.0.

Author(s)

Maintainer: Youzhi Yu yuyouzhi666@icloud.com

See Also

Useful links:


Emoji category to unicode crosswalk

Description

A table with one row per Unicode category, listing every emoji glyph in that category as a single |-separated string.

Usage

category_unicode_crosswalk

Format

A data frame with two columns:

category

The Unicode category (10 categories).

unicodes

The emoji glyphs in the category, separated by |.

Source

Derived from the emojis table of the emoji package; rebuilt by data-raw/crosswalks.R.


Categorise each row by the emoji categories it contains

Description

emoji_categorize() keeps the rows of data that contain emoji and adds a .emoji_category column listing the distinct Unicode categories present in that row (for example "Smileys & Emotion"), separated by | when a row spans more than one category.

Usage

emoji_categorize(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

data, as a tibble, filtered to the rows containing emoji and with an added .emoji_category column.

Examples

df <- data.frame(text = c("smile \U0001f600",
                          "flag \U0001f3c1\U0001f600",
                          "nothing"))
emoji_categorize(df, text)

Add a list-column of the emoji found in each row

Description

emoji_extract_nest() returns data unchanged except for an added list-column, .emoji_unicode, holding the emoji found in each row. Detection is grapheme-aware, so skin-tone modifiers and ZWJ sequences (for example family emoji) are kept intact as a single emoji.

Usage

emoji_extract_nest(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

data with an added list-column .emoji_unicode.

See Also

emoji_extract_unnest() for a long, counted form and emoji_tokens() for one row per emoji with metadata.

Examples

df <- data.frame(text = c("hi \U0001f600\U0001f603", "none"))
emoji_extract_nest(df, text)

Emoji counts per row, in long (tidy) form

Description

emoji_extract_unnest() returns one row per (row, emoji) pair with a count, dropping rows that contain no emoji. row_number refers to the position of the entry in data.

Usage

emoji_extract_unnest(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

A tibble with columns row_number, .emoji_unicode and .emoji_count.

Examples

df <- data.frame(text = c("hi \U0001f600\U0001f600", "none", "\U0001f44b"))
emoji_extract_unnest(df, text)

Keep only the rows whose text contains emoji

Description

emoji_filter() returns the rows of data whose text column contains at least one emoji, preserving every original column. emoji_tweets() is a synonym retained for backward compatibility.

Usage

emoji_filter(data, text)

emoji_tweets(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

A tibble containing only the rows with at least one emoji.

Examples

df <- data.frame(text = c("hi \U0001f600", "no emoji", "bye \U0001f44b"))
emoji_filter(df, text)

Frequency of every emoji in a text column

Description

emoji_frequency() counts how often each emoji appears across the whole text column (an entry containing the same emoji twice contributes 2) and returns a tibble sorted by descending count, with each emoji's name, shortcode and category.

Usage

emoji_frequency(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

A tibble with columns emoji, name, shortcode, group and n.

See Also

top_n_emojis() for just the most frequent emoji.

Examples

df <- data.frame(text = c("\U0001f600\U0001f600", "\U0001f621"))
emoji_frequency(df, text)

Score the sentiment of the emoji in each row

Description

emoji_sentiment() adds the mean emoji sentiment of each row, based on the Emoji Sentiment Ranking lexicon (see emoji_sentiment_lexicon). Scores range from -1 (negative) through 0 (neutral) to +1 (positive). Rows that contain no emoji, or whose emoji are absent from the lexicon, receive NA.

Usage

emoji_sentiment(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

data, as a tibble, with added columns .emoji_n (the number of emoji in the row) and .emoji_sentiment (the mean sentiment of the emoji that appear in the lexicon).

References

Kralj Novak P, Smailovic J, Sluban B, Mozetic I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296

See Also

emoji_sentiment_lexicon for the underlying scores.

Examples

df <- data.frame(text = c("love it \U0001f60d", "awful \U0001f621", "meh"))
emoji_sentiment(df, text)

Emoji Sentiment Ranking lexicon

Description

Sentiment scores for emoji, from the Emoji Sentiment Ranking 1.0, computed from ~70,000 tweets in 13 European languages annotated for sentiment. The sentiment_score is (positive - negative) / occurrences, ranging from -1 (negative) to +1 (positive); sentiment_label is derived from its sign.

Usage

emoji_sentiment_lexicon

Format

A data frame with one row per emoji and the columns:

emoji

The emoji glyph.

occurrences

Number of times the emoji was observed.

position

Mean position of the emoji within its text (0-1).

negative, neutral, positive

Annotation counts for each class.

sentiment_score

Sentiment score from -1 to 1.

sentiment_label

"negative", "neutral" or "positive".

unicode_name

The official Unicode character name.

unicode_block

The Unicode block.

Source

Kralj Novak P, Smailovic J, Sluban B, Mozetic I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296. Data from https://hdl.handle.net/11356/1048, released under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) licence. Processed by data-raw/emoji_sentiment_lexicon.R.


Summarise emoji presence in a text column

Description

emoji_summary() reports how many entries in a text column contain at least one emoji, alongside the total number of entries. An entry is counted once regardless of how many emoji it holds.

Usage

emoji_summary(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

A one-row tibble with columns emoji_tweets (entries containing at least one emoji) and total_tweets (all entries).

See Also

emoji_filter() to keep the emoji-bearing rows themselves.

Examples

df <- data.frame(text = c("I love R \U0001f600",
                          "no emoji here",
                          "flags \U0001f3c1\U0001f600"))
emoji_summary(df, text)

Tidy emoji tokens, one row per occurrence with metadata

Description

emoji_tokens() expands data to one row per emoji occurrence (in reading order), keeping the original columns and adding the glyph together with its name, category and sentiment score. This mirrors the one-token-per-row shape familiar from tidy text mining and is convenient for counting, joining and plotting.

Usage

emoji_tokens(data, text)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

Value

A tibble with the original columns plus .emoji, .emoji_name, .emoji_category and .emoji_sentiment. Rows without emoji are dropped.

See Also

emoji_frequency() for corpus-level counts and emoji_sentiment() for per-row sentiment.

Examples

df <- data.frame(id = 1:2, text = c("great \U0001f600", "bad \U0001f621"))
emoji_tokens(df, text)

Emoji name, unicode and category crosswalk

Description

A table with one row per emoji name: each emoji glyph appears once for every GitHub-style name it is known by, so a single unicode can occur on several rows (for example the grinning face is both "grinning" and "grinning_face").

Usage

emoji_unicode_crosswalk

Format

A data frame with three columns:

emoji_name

The emoji name / shortcode (e.g. "grinning").

unicode

The emoji glyph.

emoji_category

The Unicode category the emoji belongs to.

Source

Derived from the emojis table of the emoji package; rebuilt by data-raw/crosswalks.R.


The most frequent emoji in a text column

Description

top_n_emojis() returns the n most frequent emoji. By default each emoji (unicode) appears on a single row; set duplicated = TRUE to list every name an emoji is known by, so glyphs that share several names occupy several rows.

Usage

top_n_emojis(
  data,
  text,
  n = 20,
  duplicated = FALSE,
  duplicated_unicode = lifecycle::deprecated()
)

Arguments

data

A data frame or tibble containing a text column.

text

The text column to scan, supplied unquoted.

n

Number of emoji to return. Default 20.

duplicated

If TRUE, emoji with several names occupy several rows. Default FALSE.

duplicated_unicode

[Deprecated] Use duplicated instead.

Value

A tibble with columns emoji_name, unicode, emoji_category and n.

See Also

emoji_frequency() for the full distribution.

Examples

df <- data.frame(text = c("\U0001f600\U0001f600\U0001f3c1", "\U0001f621"))
top_n_emojis(df, text, n = 2)

mirror server hosted at Truenetwork, Russian Federation.