Skip to content

OpenAICompatible

Languages: English · 中文

OpenAICompatible is one of the three protocol-level model request plugins (see Models Overview). It handles any endpoint that speaks the OpenAI Chat Completions API — which today covers most commercial providers and most local model servers.

Settings

python
from agently import Agently

Agently.set_settings("OpenAICompatible", {
    "base_url": "https://api.openai.com/v1",
    "api_key": "${ENV.OPENAI_API_KEY}",
    "model": "${ENV.OPENAI_MODEL}",
})
KeyMeaning
base_urlthe API root, e.g. https://api.openai.com/v1
api_keybearer token; omit for local servers that don't require auth
modelprovider-specific model name
model_type"chat" (default) or "completion" for legacy completion endpoints
request_retrytransient transport retry policy; defaults to {"max_attempts": 2} and only retries before output starts
request_optionsextra dict forwarded to the underlying HTTP client (timeouts, headers)

The full set lives in the agently/builtins/plugins/ModelRequester/OpenAICompatible/ package. The public plugin class is exported from plugin.py, while request building, credentials, transport, handler binding, and response mapping live under its private modules/ package.

Responses API variant

Some providers (and OpenAI itself for newer models) speak the Responses API rather than Chat Completions. Agently has a sibling plugin:

python
Agently.set_settings("OpenAIResponsesCompatible", {
    "base_url": "https://api.openai.com/v1",
    "api_key": "${ENV.OPENAI_API_KEY}",
    "model": "${ENV.OPENAI_RESPONSES_MODEL}",
})

OpenAIResponsesCompatible is a sibling of OpenAICompatible; pick whichever matches the protocol your endpoint exposes. Both plugins directly implement ModelRequester; neither plugin inherits from the other.

What "OpenAI-compatible" actually covers

A provider qualifies as OpenAI-compatible when its endpoint:

  • Accepts a JSON body with messages: [{"role": ..., "content": ...}, ...].
  • Returns either a JSON response or an SSE stream of token deltas.
  • Uses standard fields like model, temperature, max_tokens, tools, etc.

Providers that fit:

  • OpenAI / Azure OpenAI
  • DeepSeek (https://api.deepseek.com/v1)
  • Qwen / DashScope's compatibility mode (https://dashscope.aliyuncs.com/compatible-mode/v1)
  • Kimi / Moonshot (https://api.moonshot.cn/v1)
  • GLM (https://open.bigmodel.cn/api/paas/v4/)
  • MiniMax, Doubao, ERNIE — most ship an OpenAI-compatible mode
  • SiliconFlow, Groq — both expose OpenAI-compatible endpoints
  • Gemini — via OpenAI-compat endpoint
  • Ollama (local) — http://127.0.0.1:11434/v1
  • vLLM, LM Studio, llama.cpp server (local)
  • Most internal gateways teams build over commercial models

For per-provider recipes, see Providers.

Per-agent overrides

Agent-level settings override the global preset:

python
agent = Agently.create_agent()
agent.set_settings("OpenAICompatible", {"model": "${ENV.OPENAI_MODEL_FAST}"})

You can also set request-level overrides via the request chain — see Settings.

Streaming and tools

OpenAICompatible handles both streaming responses (used by get_generator(...) / get_async_generator(...)) and tool calling (used by the action runtime). You don't need to enable these per-provider — they're on as the protocol allows.

If a particular provider doesn't fully implement OpenAI semantics for one of these (e.g., a quirky streaming format), the underlying plugin tries to be tolerant; report concrete cases via issues.

For transient transport failures such as a connection reset or provider-side disconnect before any output is emitted, OpenAICompatible retries the same request once by default. This does not change the selected model, prompt, or structured output format. Set "request_retry": {"max_attempts": 1} or "request_retry": False to disable that replay. Once output has started, Agently does not replay the stream automatically, because doing so could duplicate partial content.

See also