tidyEmoji helps you discover, count, categorise and sentiment-score the emoji in any text column β social-media posts, product reviews, chat logs, survey responses, support tickets β and hands the result back as tidy data frames that drop straight into a tidyverse workflow.
Unicode is awkward to work with and not every code point is an emoji, which makes emoji statistics fiddly. tidyEmoji takes care of that, including grapheme-aware detection so skin-tone modifiers (ππ½) and multi-person sequences (π¨βπ©βπ§βπ¦) are treated as a single emoji rather than being split apart.
install.packages("tidyEmoji")
# development version
# install.packages("devtools")
devtools::install_github("PursuitOfDataScience/tidyEmoji")library(tidyEmoji)
library(dplyr)
reviews <- data.frame(text = c("Best purchase ever \U0001f600\U0001f60d",
"It broke after a day \U0001f621",
"Does the job.",
"Wearing my mask \U0001f637\U0001f637",
"Shipped fast \U0001f3c1\U0001f600"))reviews %>% emoji_summary(text) # entries with emoji vs. total
#> # A tibble: 1 Γ 2
#> emoji_tweets total_tweets
#> <int> <int>
#> 1 4 5
reviews %>% emoji_filter(text) # keep only the rows that have emoji
#> # A tibble: 4 Γ 1
#> text
#> <chr>
#> 1 Best purchase ever ππ
#> 2 It broke after a day π‘
#> 3 Wearing my mask π·π·
#> 4 Shipped fast ππreviews %>% emoji_frequency(text) # every emoji, with name + category
#> # A tibble: 5 Γ 5
#> emoji name shortcode group n
#> <chr> <chr> <chr> <chr> <int>
#> 1 π grinning face grinning Smileys & Emotion 2
#> 2 π· face with medical mask mask Smileys & Emotion 2
#> 3 π chequered flag checkered_flag Flags 1
#> 4 π smiling face with heart-eyes heart_eyes Smileys & Emotion 1
#> 5 π‘ enraged face rage Smileys & Emotion 1
reviews %>% top_n_emojis(text, n = 3) # just the most frequent
#> # A tibble: 3 Γ 4
#> emoji_name unicode emoji_category n
#> <chr> <chr> <chr> <int>
#> 1 grinning π Smileys & Emotion 2
#> 2 mask π· Smileys & Emotion 2
#> 3 checkered_flag π Flags 1emoji_tokens() gives one tidy row per emoji occurrence,
with its name, category and sentiment β ready to count, join or
plot.
reviews %>% emoji_tokens(text)
#> # A tibble: 7 Γ 5
#> text .emoji .emoji_name .emoji_category .emoji_sentiment
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 Best purchase ever ππ π grinning face Smileys & Emotβ¦ 0.572
#> 2 Best purchase ever ππ π smiling face β¦ Smileys & Emotβ¦ 0.678
#> 3 It broke after a day π‘ π‘ enraged face Smileys & Emotβ¦ -0.173
#> 4 Wearing my mask π·π· π· face with medβ¦ Smileys & Emotβ¦ -0.171
#> 5 Wearing my mask π·π· π· face with medβ¦ Smileys & Emotβ¦ -0.171
#> 6 Shipped fast ππ π chequered flag Flags 0.571
#> 7 Shipped fast ππ π grinning face Smileys & Emotβ¦ 0.572reviews %>% emoji_categorize(text) # which Unicode categories each row spans
#> # A tibble: 4 Γ 2
#> text .emoji_category
#> <chr> <chr>
#> 1 Best purchase ever ππ Smileys & Emotion
#> 2 It broke after a day π‘ Smileys & Emotion
#> 3 Wearing my mask π·π· Smileys & Emotion
#> 4 Shipped fast ππ Flags|Smileys & Emotion
reviews %>% emoji_sentiment(text) # mean emoji sentiment per row (-1 to +1)
#> # A tibble: 5 Γ 3
#> text .emoji_n .emoji_sentiment
#> <chr> <int> <dbl>
#> 1 Best purchase ever ππ 2 0.625
#> 2 It broke after a day π‘ 1 -0.173
#> 3 Does the job. 0 NA
#> 4 Wearing my mask π·π· 2 -0.171
#> 5 Shipped fast ππ 2 0.572emoji_sentiment() uses the bundled Emoji
Sentiment Ranking lexicon (Kralj Novak et al., 2015). See the
package vignette for a fuller tour.