OpenAICompatible
Languages: English · 中文
OpenAICompatible is one of the three protocol-level model request plugins (see Models Overview). It handles any endpoint that speaks the OpenAI Chat Completions API — which today covers most commercial providers and most local model servers.
Settings
from agently import Agently
Agently.set_settings("OpenAICompatible", {
"base_url": "https://api.openai.com/v1",
"api_key": "${ENV.OPENAI_API_KEY}",
"model": "${ENV.OPENAI_MODEL}",
})| Key | Meaning |
|---|---|
base_url | the API root, e.g. https://api.openai.com/v1 |
api_key | bearer token; omit for local servers that don't require auth |
model | provider-specific model name |
model_type | "chat" (default) or "completion" for legacy completion endpoints |
request_retry | transient transport retry policy; defaults to {"max_attempts": 2} and only retries before output starts |
request_options | extra dict forwarded to the underlying HTTP client (timeouts, headers) |
The full set lives in the agently/builtins/plugins/ModelRequester/OpenAICompatible/ package. The public plugin class is exported from plugin.py, while request building, credentials, transport, handler binding, and response mapping live under its private modules/ package.
Responses API variant
Some providers (and OpenAI itself for newer models) speak the Responses API rather than Chat Completions. Agently has a sibling plugin:
Agently.set_settings("OpenAIResponsesCompatible", {
"base_url": "https://api.openai.com/v1",
"api_key": "${ENV.OPENAI_API_KEY}",
"model": "${ENV.OPENAI_RESPONSES_MODEL}",
})OpenAIResponsesCompatible is a sibling of OpenAICompatible; pick whichever matches the protocol your endpoint exposes. Both plugins directly implement ModelRequester; neither plugin inherits from the other.
What "OpenAI-compatible" actually covers
A provider qualifies as OpenAI-compatible when its endpoint:
- Accepts a JSON body with
messages: [{"role": ..., "content": ...}, ...]. - Returns either a JSON response or an SSE stream of token deltas.
- Uses standard fields like
model,temperature,max_tokens,tools, etc.
Providers that fit:
- OpenAI / Azure OpenAI
- DeepSeek (
https://api.deepseek.com/v1) - Qwen / DashScope's compatibility mode (
https://dashscope.aliyuncs.com/compatible-mode/v1) - Kimi / Moonshot (
https://api.moonshot.cn/v1) - GLM (
https://open.bigmodel.cn/api/paas/v4/) - MiniMax, Doubao, ERNIE — most ship an OpenAI-compatible mode
- SiliconFlow, Groq — both expose OpenAI-compatible endpoints
- Gemini — via OpenAI-compat endpoint
- Ollama (local) —
http://127.0.0.1:11434/v1 - vLLM, LM Studio, llama.cpp server (local)
- Most internal gateways teams build over commercial models
For per-provider recipes, see Providers.
Per-agent overrides
Agent-level settings override the global preset:
agent = Agently.create_agent()
agent.set_settings("OpenAICompatible", {"model": "${ENV.OPENAI_MODEL_FAST}"})You can also set request-level overrides via the request chain — see Settings.
Streaming and tools
OpenAICompatible handles both streaming responses (used by get_generator(...) / get_async_generator(...)) and tool calling (used by the action runtime). You don't need to enable these per-provider — they're on as the protocol allows.
If a particular provider doesn't fully implement OpenAI semantics for one of these (e.g., a quirky streaming format), the underlying plugin tries to be tolerant; report concrete cases via issues.
For transient transport failures such as a connection reset or provider-side disconnect before any output is emitted, OpenAICompatible retries the same request once by default. This does not change the selected model, prompt, or structured output format. Set "request_retry": {"max_attempts": 1} or "request_retry": False to disable that replay. Once output has started, Agently does not replay the stream automatically, because doing so could duplicate partial content.
See also
- Models Overview — protocol picking
- AnthropicCompatible — the other protocol plugin
- Providers — recipes per provider
- Model Setup — quickstart-level walkthrough