This vignette shows how to train a BERTopic model from R and persist
it to disk along with the R-side extras (probabilities, reduced
embeddings, and dynamic topic outputs). Set eval = TRUE for
the chunks you want to run.
Python environment selection and checks are handled in the hidden setup chunk at the top of the vignette.
Below, the German sample dataframe is used for topic analysis.
The train_bertopic_model() function is a convenience
function. For more options / parameter finetuning, see the other
vignette (topics_spiegel.Rmd) or the Quarto file
(inst/extdata/topics_spiegel.qmd).
For more settings of the train_bertopic_model()
function, check the help file.
topic_model <- train_bertopic_model(
docs = docs,
top_n_words = 50L, # set integer numbger of top words
embedding_model = "Qwen/Qwen3-Embedding-0.6B", # choose your (multilingual) model from huggingface.co
embedding_show_progress = TRUE,
timestamps = df$date, # set this to NULL if not applicable with your data
classes = df$genre, # set this to NULL if not applicable with your data
representation_model = "keybert" # keyword generation for each topic
)BERTopic - WARNING: When you use
pickleto save/load a BERTopic model,please make sure that the environments in which you save and load the model are exactly the same. The version of BERTopic,its dependencies, and python need to remain the same.