| Type: | Package |
| Title: | Discover, Count and Score Emoji in Text |
| Version: | 0.2.0 |
| Description: | A tidy toolkit for working with the emoji in any text column, such as social-media posts, product reviews, chat logs or survey responses. Unicode is awkward to handle and not every code point is an emoji, which makes emoji statistics fiddly to obtain. 'tidyEmoji' extracts, counts, categorises and sentiment-scores emoji with grapheme-aware detection (so skin-tone and multi-person sequences stay intact), returning tidy data frames that slot straight into a 'tidyverse' workflow. The bundled emoji sentiment lexicon is from the Emoji Sentiment Ranking of Kralj Novak et al. (2015) <doi:10.1371/journal.pone.0144296>, released under CC BY-SA 4.0. |
| License: | GPL (≥ 3) |
| URL: | https://pursuitofdatascience.github.io/tidyEmoji/ |
| BugReports: | https://github.com/PursuitOfDataScience/tidyEmoji/issues |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| Depends: | R (≥ 3.5.0) |
| Imports: | dplyr (≥ 1.1.0), emoji, lifecycle, stats, tibble, tidyr, utils |
| Suggests: | rmarkdown, knitr, testthat (≥ 3.0.0), ggplot2, readr, forcats, stringr |
| Config/testthat/edition: | 3 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-06-16 21:03:52 UTC; youzhi |
| Author: | Youzhi Yu [aut, cre] |
| Maintainer: | Youzhi Yu <yuyouzhi666@icloud.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-17 07:20:02 UTC |
tidyEmoji: Discover, Count and Score Emoji in Text
Description
A tidy toolkit for working with the emoji in any text column, such as social-media posts, product reviews, chat logs or survey responses. Unicode is awkward to handle and not every code point is an emoji, which makes emoji statistics fiddly to obtain. 'tidyEmoji' extracts, counts, categorises and sentiment-scores emoji with grapheme-aware detection (so skin-tone and multi-person sequences stay intact), returning tidy data frames that slot straight into a 'tidyverse' workflow. The bundled emoji sentiment lexicon is from the Emoji Sentiment Ranking of Kralj Novak et al. (2015) doi:10.1371/journal.pone.0144296, released under CC BY-SA 4.0.
Author(s)
Maintainer: Youzhi Yu yuyouzhi666@icloud.com
See Also
Useful links:
Report bugs at https://github.com/PursuitOfDataScience/tidyEmoji/issues
Emoji category to unicode crosswalk
Description
A table with one row per Unicode category, listing every emoji glyph in that
category as a single |-separated string.
Usage
category_unicode_crosswalk
Format
A data frame with two columns:
- category
The Unicode category (10 categories).
- unicodes
The emoji glyphs in the category, separated by
|.
Source
Derived from the emojis table of the emoji package; rebuilt by
data-raw/crosswalks.R.
Categorise each row by the emoji categories it contains
Description
emoji_categorize() keeps the rows of data that contain emoji and adds a
.emoji_category column listing the distinct Unicode categories present in
that row (for example "Smileys & Emotion"), separated by | when a row spans
more than one category.
Usage
emoji_categorize(data, text)
Arguments
data |
A data frame or tibble containing a text column. |
text |
The text column to scan, supplied unquoted. |
Value
data, as a tibble, filtered to the rows containing emoji and with an
added .emoji_category column.
Examples
df <- data.frame(text = c("smile \U0001f600",
"flag \U0001f3c1\U0001f600",
"nothing"))
emoji_categorize(df, text)
Add a list-column of the emoji found in each row
Description
emoji_extract_nest() returns data unchanged except for an added
list-column, .emoji_unicode, holding the emoji found in each row. Detection
is grapheme-aware, so skin-tone modifiers and ZWJ sequences (for example
family emoji) are kept intact as a single emoji.
Usage
emoji_extract_nest(data, text)
Arguments
data |
A data frame or tibble containing a text column. |
text |
The text column to scan, supplied unquoted. |
Value
data with an added list-column .emoji_unicode.
See Also
emoji_extract_unnest() for a long, counted form and
emoji_tokens() for one row per emoji with metadata.
Examples
df <- data.frame(text = c("hi \U0001f600\U0001f603", "none"))
emoji_extract_nest(df, text)
Emoji counts per row, in long (tidy) form
Description
emoji_extract_unnest() returns one row per (row, emoji) pair with a count,
dropping rows that contain no emoji. row_number refers to the position of
the entry in data.
Usage
emoji_extract_unnest(data, text)
Arguments
data |
A data frame or tibble containing a text column. |
text |
The text column to scan, supplied unquoted. |
Value
A tibble with columns row_number, .emoji_unicode and
.emoji_count.
Examples
df <- data.frame(text = c("hi \U0001f600\U0001f600", "none", "\U0001f44b"))
emoji_extract_unnest(df, text)
Keep only the rows whose text contains emoji
Description
emoji_filter() returns the rows of data whose text column contains at
least one emoji, preserving every original column. emoji_tweets() is a
synonym retained for backward compatibility.
Usage
emoji_filter(data, text)
emoji_tweets(data, text)
Arguments
data |
A data frame or tibble containing a text column. |
text |
The text column to scan, supplied unquoted. |
Value
A tibble containing only the rows with at least one emoji.
Examples
df <- data.frame(text = c("hi \U0001f600", "no emoji", "bye \U0001f44b"))
emoji_filter(df, text)
Frequency of every emoji in a text column
Description
emoji_frequency() counts how often each emoji appears across the whole text
column (an entry containing the same emoji twice contributes 2) and returns a
tibble sorted by descending count, with each emoji's name, shortcode and
category.
Usage
emoji_frequency(data, text)
Arguments
data |
A data frame or tibble containing a text column. |
text |
The text column to scan, supplied unquoted. |
Value
A tibble with columns emoji, name, shortcode, group and n.
See Also
top_n_emojis() for just the most frequent emoji.
Examples
df <- data.frame(text = c("\U0001f600\U0001f600", "\U0001f621"))
emoji_frequency(df, text)
Score the sentiment of the emoji in each row
Description
emoji_sentiment() adds the mean emoji sentiment of each row, based on the
Emoji Sentiment Ranking lexicon (see emoji_sentiment_lexicon). Scores range
from -1 (negative) through 0 (neutral) to +1 (positive). Rows that contain no
emoji, or whose emoji are absent from the lexicon, receive NA.
Usage
emoji_sentiment(data, text)
Arguments
data |
A data frame or tibble containing a text column. |
text |
The text column to scan, supplied unquoted. |
Value
data, as a tibble, with added columns .emoji_n (the number of
emoji in the row) and .emoji_sentiment (the mean sentiment of the emoji
that appear in the lexicon).
References
Kralj Novak P, Smailovic J, Sluban B, Mozetic I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296
See Also
emoji_sentiment_lexicon for the underlying scores.
Examples
df <- data.frame(text = c("love it \U0001f60d", "awful \U0001f621", "meh"))
emoji_sentiment(df, text)
Emoji Sentiment Ranking lexicon
Description
Sentiment scores for emoji, from the Emoji Sentiment Ranking 1.0, computed
from ~70,000 tweets in 13 European languages annotated for sentiment. The
sentiment_score is (positive - negative) / occurrences, ranging from -1
(negative) to +1 (positive); sentiment_label is derived from its sign.
Usage
emoji_sentiment_lexicon
Format
A data frame with one row per emoji and the columns:
- emoji
The emoji glyph.
- occurrences
Number of times the emoji was observed.
- position
Mean position of the emoji within its text (0-1).
- negative, neutral, positive
Annotation counts for each class.
- sentiment_score
Sentiment score from -1 to 1.
- sentiment_label
"negative", "neutral" or "positive".
- unicode_name
The official Unicode character name.
- unicode_block
The Unicode block.
Source
Kralj Novak P, Smailovic J, Sluban B, Mozetic I (2015) Sentiment of
Emojis. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296.
Data from https://hdl.handle.net/11356/1048, released under the
Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
licence. Processed by data-raw/emoji_sentiment_lexicon.R.
Summarise emoji presence in a text column
Description
emoji_summary() reports how many entries in a text column contain at least
one emoji, alongside the total number of entries. An entry is counted once
regardless of how many emoji it holds.
Usage
emoji_summary(data, text)
Arguments
data |
A data frame or tibble containing a text column. |
text |
The text column to scan, supplied unquoted. |
Value
A one-row tibble with columns emoji_tweets (entries containing at
least one emoji) and total_tweets (all entries).
See Also
emoji_filter() to keep the emoji-bearing rows themselves.
Examples
df <- data.frame(text = c("I love R \U0001f600",
"no emoji here",
"flags \U0001f3c1\U0001f600"))
emoji_summary(df, text)
Tidy emoji tokens, one row per occurrence with metadata
Description
emoji_tokens() expands data to one row per emoji occurrence (in reading
order), keeping the original columns and adding the glyph together with its
name, category and sentiment score. This mirrors the one-token-per-row shape
familiar from tidy text mining and is convenient for counting, joining and
plotting.
Usage
emoji_tokens(data, text)
Arguments
data |
A data frame or tibble containing a text column. |
text |
The text column to scan, supplied unquoted. |
Value
A tibble with the original columns plus .emoji, .emoji_name,
.emoji_category and .emoji_sentiment. Rows without emoji are dropped.
See Also
emoji_frequency() for corpus-level counts and emoji_sentiment()
for per-row sentiment.
Examples
df <- data.frame(id = 1:2, text = c("great \U0001f600", "bad \U0001f621"))
emoji_tokens(df, text)
Emoji name, unicode and category crosswalk
Description
A table with one row per emoji name: each emoji glyph appears once for every GitHub-style name it is known by, so a single unicode can occur on several rows (for example the grinning face is both "grinning" and "grinning_face").
Usage
emoji_unicode_crosswalk
Format
A data frame with three columns:
- emoji_name
The emoji name / shortcode (e.g. "grinning").
- unicode
The emoji glyph.
- emoji_category
The Unicode category the emoji belongs to.
Source
Derived from the emojis table of the emoji package; rebuilt by
data-raw/crosswalks.R.
The most frequent emoji in a text column
Description
top_n_emojis() returns the n most frequent emoji. By default each emoji
(unicode) appears on a single row; set duplicated = TRUE to list every name
an emoji is known by, so glyphs that share several names occupy several rows.
Usage
top_n_emojis(
data,
text,
n = 20,
duplicated = FALSE,
duplicated_unicode = lifecycle::deprecated()
)
Arguments
data |
A data frame or tibble containing a text column. |
text |
The text column to scan, supplied unquoted. |
n |
Number of emoji to return. Default |
duplicated |
If |
duplicated_unicode |
Value
A tibble with columns emoji_name, unicode, emoji_category and
n.
See Also
emoji_frequency() for the full distribution.
Examples
df <- data.frame(text = c("\U0001f600\U0001f600\U0001f3c1", "\U0001f621"))
top_n_emojis(df, text, n = 2)