Calls the /v1/reranking endpoint of a running llama.cpp server to score
documents by relevance to a query. Useful for building fully-local RAG
pipelines (embed → cosine search → rerank → chat, all with llamacpp()).
Usage
llamacpp_rerank(
.query,
.documents,
.model = "local-model",
.server = Sys.getenv("LLAMACPP_SERVER", "http://localhost:8080"),
.api_key = Sys.getenv("LLAMACPP_API_KEY", ""),
.timeout = 60,
.max_tries = 3,
.dry_run = FALSE
)Arguments
- .query
A single query string.
- .documents
A character vector of documents to rerank.
- .model
The model name (default:
"local-model").- .server
Base URL of the llama.cpp server. Defaults to
LLAMACPP_SERVERenv var, falling back to"http://localhost:8080".- .api_key
API key for the server (default:
LLAMACPP_API_KEYenv var).- .timeout
Request timeout in seconds (default: 60).
- .max_tries
Maximum retries (default: 3).
- .dry_run
If TRUE, returns the request object without executing it (default: FALSE).
