Skip to contents

This function sends a message history to the Azure OpenAI Chat Completions API and returns the assistant's reply. This function is work in progress and not fully tested

Usage

azure_openai_chat(
  .llm,
  .endpoint_url = Sys.getenv("AZURE_ENDPOINT_URL"),
  .deployment = "gpt-4o-mini",
  .api_version = "2024-08-01-preview",
  .max_completion_tokens = NULL,
  .frequency_penalty = NULL,
  .logit_bias = NULL,
  .logprobs = FALSE,
  .top_logprobs = NULL,
  .presence_penalty = NULL,
  .seed = NULL,
  .stop = NULL,
  .stream = FALSE,
  .temperature = NULL,
  .top_p = NULL,
  .timeout = 60,
  .verbose = FALSE,
  .json_schema = NULL,
  .dry_run = FALSE,
  .max_tries = 3
)

Arguments

.llm

An LLMMessage object containing the conversation history.

.endpoint_url

Base URL for the API (default: Sys.getenv("AZURE_ENDPOINT_URL")).

.deployment

The identifier of the model that is deployed (default: "gpt-4o-mini").

.api_version

Which version of the API is deployed (default: "2024-08-01-preview")

.max_completion_tokens

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

.frequency_penalty

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far.

.logit_bias

A named list modifying the likelihood of specified tokens appearing in the completion.

.logprobs

Whether to return log probabilities of the output tokens (default: FALSE).

.top_logprobs

An integer between 0 and 20 specifying the number of most likely tokens to return at each token position.

.presence_penalty

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.

.seed

If specified, the system will make a best effort to sample deterministically.

.stop

Up to 4 sequences where the API will stop generating further tokens.

.stream

If set to TRUE, the answer will be streamed to console as it comes (default: FALSE).

.temperature

What sampling temperature to use, between 0 and 2. Higher values make the output more random.

.top_p

An alternative to sampling with temperature, called nucleus sampling.

.timeout

Request timeout in seconds (default: 60).

.verbose

Should additional information be shown after the API call (default: FALSE).

.json_schema

A JSON schema object as R list to enforce the output structure (If defined has precedence over JSON mode).

.dry_run

If TRUE, perform a dry run and return the request object (default: FALSE).

.max_tries

Maximum retries to perform request

Value

A new LLMMessage object containing the original messages plus the assistant's response.

Examples

if (FALSE) { # \dontrun{
# Basic usage
msg <- llm_message("What is R programming?")
result <- azure_openai_chat(msg)

# With custom parameters
result2 <- azure_openai_chat(msg, 
                 .deployment = "gpt-4o-mini",
                 .temperature = 0.7, 
                 .max_tokens = 1000)
} # }