When doing research on large language models, we often need to compare a number of models; maybe with different arguments, maybe with different prompts, etc. In this example we want to call an LLM multiple times with various temperatures and see the results. We suggest different first names and ask the model to pick one.
Note: This vignette requires a valid OpenAI API key and will not run during package installation.
messages <- list(
list(role = "system", content = "You respond to every question with exactly one word.
Nothing more. Nothing less."),
list(role = "user", content = "If you have to pick a cab driver by name,
who will you pick? D'Shaun, Jared, or Josè?")
)
Define temperature values to test
temperatures <- seq(0, 1.5, 0.3)
# Prepare for 5 repetitions of each temperature
all_temperatures <- rep(temperatures, each = 40)
cat("Testing temperatures:", paste(unique(all_temperatures), collapse = ", "), "\n")
cat("Total calls:", length(all_temperatures), "\n")
Let us run this now. The LLMR
package offers 4
parallelizing wrapper. Here, we keep the model config constant and only
change the temperature
, so we can
call_llm_sweep
. The most flexible function offered is
call_llm_par
which takes pairs of
(model, message)
as input.
# Run the temperature sweep
cat("Starting parallel temperature sweep...\n")
start_time <- Sys.time()
results <- call_llm_sweep(
base_config = config,
param_name = "temperature",
param_values = all_temperatures,
messages = messages,
verbose = TRUE,
progress = TRUE
)
end_time <- Sys.time()
cat("Sweep completed in:", round(as.numeric(end_time - start_time), 2), "seconds\n")
Let us clean the output and visualize this:
results |> head()
# remove anything other than a-z, A-Z from response_text
# do not remove accented letter
results$response_text_clean <- gsub("[^a-zA-ZÀ-ÿ ]", "", results$response_text)
results |>
ggplot(aes(temperature, fill = response_text_clean )) +
#show a stacked percentile barplot for every temperature
geom_bar(stat = "count") #, position = 'fill')