Send a Batch of Messages to Ollama API — send_ollama

This function creates and submits a batch of messages to the Ollama API Contrary to other batch functions, this functions waits for the batch to finish and receives requests. The advantage compared to sending single messages via chat() is that Ollama handles large parallel requests quicker than many individual chat requests.

Usage

send_ollama_batch(
  .llms,
  .model = "gemma2",
  .stream = FALSE,
  .seed = NULL,
  .json_schema = NULL,
  .temperature = NULL,
  .num_ctx = 2048,
  .num_predict = NULL,
  .top_k = NULL,
  .top_p = NULL,
  .min_p = NULL,
  .mirostat = NULL,
  .mirostat_eta = NULL,
  .mirostat_tau = NULL,
  .repeat_last_n = NULL,
  .repeat_penalty = NULL,
  .tfs_z = NULL,
  .stop = NULL,
  .ollama_server = "http://localhost:11434",
  .timeout = 120,
  .keep_alive = NULL,
  .dry_run = FALSE
)

Arguments

.llms: A list of LLMMessage objects containing conversation histories.
.model: Character string specifying the Ollama model to use (default: "gemma2")
.stream: Logical; whether to stream the response (default: FALSE)
.seed: Integer; seed for reproducible generation (default: NULL)
.json_schema: A JSON schema object as R list to enforce the output structure (default: NULL)
.temperature: Float between 0-2; controls randomness in responses (default: NULL)
.num_ctx: Integer; sets the context window size (default: 2048)
.num_predict: Integer; maximum number of tokens to predict (default: NULL)
.top_k: Integer; controls diversity by limiting top tokens considered (default: NULL)
.top_p: Float between 0-1; nucleus sampling threshold (default: NULL)
.min_p: Float between 0-1; minimum probability threshold (default: NULL)
.mirostat: Integer (0,1,2); enables Mirostat sampling algorithm (default: NULL)
.mirostat_eta: Float; Mirostat learning rate (default: NULL)
.mirostat_tau: Float; Mirostat target entropy (default: NULL)
.repeat_last_n: Integer; tokens to look back for repetition (default: NULL)
.repeat_penalty: Float; penalty for repeated tokens (default: NULL)
.tfs_z: Float; tail free sampling parameter (default: NULL)
.stop: Character; custom stop sequence(s) (default: NULL)
.ollama_server: String; Ollama API endpoint (default: "http://localhost:11434")
.timeout: Integer; API request timeout in seconds (default: 120)
.keep_alive: Character; How long should the ollama model be kept in memory after request (default: NULL - 5 Minutes)
.dry_run: Logical; if TRUE, returns request object without execution (default: FALSE)

Value

A list of updated LLMMessage objects, each with the assistant's response added if successful.

Details

The function provides extensive control over the generation process through various parameters:

Temperature (0-2): Higher values increase creativity, lower values make responses more focused
Top-k/Top-p: Control diversity of generated text
Mirostat: Advanced sampling algorithm for maintaining consistent complexity
Repeat penalties: Prevent repetitive text
Context window: Control how much previous conversation is considered