From PoC to Production

Languages: English · 中文

An agent prototype often proves that the model can answer once. Production asks a different question: can the team make the same capability callable, reviewable, recoverable, observable, and safe to change?

Staged Acceptance

Stage	Goal	Acceptance question	Docs
0. Minimal prototype	Run model and business input	Is model setup reproducible and failure understandable?	Quickstart, Model Setup
1. Stable output	Make results consumable by systems	Are fields, types, required checks, validation, and retry explicit?	Output Control
2. Streaming experience	Let UI/services see progress early	Is instant stream used for temporary UI while final `get_data()` writes durable state?	Model Response
3. Service exposure	Expose HTTP / SSE / WebSocket	Does the entry layer only validate, call, and project results? Is it async-first?	Async First, FastAPI
4. External action	Call business functions, MCP, search, browse, or tools	Are schema, visible scope, logs, errors, and audit clear?	Actions Overview
5. Controlled environment	Host MCP servers, scripts, or sandbox dependencies	Are lifecycle, permission, health, and failure semantics owned?	Execution Environment
6. Long workflow	Handle branches, concurrency, approval, waits, recovery	Can external systems use execution id, runtime stream, pause/resume, close snapshot?	TriggerFlow
7. Multi-role collaboration	Let specialist roles work on one task	Are role duties, Sub-Flow boundaries, Workspace topology, and aggregate fields clear?	Multi-Role Collaboration
8. Interaction layer	Show process to users, frontends, or IM	Are product events, channels, waiting states, and final state separate?	Interaction Layer and Active Tasks
9. Proactive tasks	Let timers, webhooks, queues, or workers move tasks	Does scheduler / queue / worker own trigger and idempotency?	Interaction Layer and Active Tasks
10. State and evidence	Keep artifacts, observations, decisions, checkpoints	Are large artifacts in Workspace and execution state limited to refs?	Workspace, Long-Running State
11. Observability and eval	Prove runtime facts and quality changes	Are RuntimeEvent, DevTools, eval cases, or replay evidence in place?	Observability, Production Governance

Minimal Enterprise Topology

Module	Owns
Gateway / API	Auth, tenant, request validation, HTTP/SSE/WebSocket
Agent Core	Agent definition, AgentExecution, prompt and output contracts
Tool Adapters	Business APIs, MCP clients, local functions, search, browse
Execution Environment	MCP servers, scripts, databases, browsers, sandboxes
Flow Orchestrator	TriggerFlow definitions, executions, runtime stream, pause/resume
Interaction Layer	Product process events, channel gateways, waiting hints, final notifications
Scheduler / Worker	Timers, webhooks, queues, idempotency, proactive task progress
Workspace Store	Observations, artifacts, decisions, checkpoints, ContextPack source
Eval / Observer	RuntimeEvent, DevTools, eval cases, replay, quality judgment

Early systems can run these modules in one process. Keeping the responsibilities separate makes later service splits much easier.

Pre-Launch Checklist

Check	Passing standard
Output contract	Business systems consume structured data, not natural-language fragments
Streaming boundary	Instant stream is temporary UI/progress state; durable writes use final data
Action boundary	Callable capabilities are registered through Actions / MCP / built-in Actions and leave records
Environment boundary	Scripts, MCP servers, browsers, SQLite, Node.js, and sandboxes have lifecycle owners
Permissions and approvals	High-risk or irreversible actions have policy, approval, or fail-closed paths
Long workflow	Waits, recovery, and human intervention have execution handles and interrupt semantics
Multi-role collaboration	Role agents, Sub-Flow, Workspace topology, and aggregate fields are explicit
Interaction layer	UI/IM consumes product events, not parser paths, chunk names, or RuntimeEvent payloads
Proactive tasks	Scheduler / webhook / queue is separated from HTTP request handlers and has idempotency
State	Large artifacts, downloads, reports, and logs stay out of prompt and execution state
Recovery	Checkpoint, save/load, runtime resource reinjection, and external resume entry are clear
Observability	Requests, Actions, ExecutionEnvironment, and TriggerFlow events can be traced
Regression	Representative inputs, output contract checks, business rules, or model judges are repeatable
Governance	Cost, timeout, retry, idempotency, safety, and audit each have an owner
Updates	Release notes, examples, website docs, and Agently-Skills guidance can be updated together

What Can Wait

Capability	When it can wait
TriggerFlow	The task is a single request with no branch, concurrency, wait, recovery, or process stream
Multi-role collaboration	One AgentExecution already returns a stable structured result
Proactive task	Work is only triggered by an immediate user request
Dynamic Task	The task graph is not submitted as data by a model or business system
Workspace	There is no cross-turn evidence, artifact, checkpoint, or long-term context
Execution Environment	The system only calls ordinary local functions or already-hosted APIs
DevTools / Eval	A very early PoC can start with logs and human review, but production needs runtime evidence

From PoC to Production ​

Staged Acceptance ​

Minimal Enterprise Topology ​

Pre-Launch Checklist ​

What Can Wait ​

Playbook Entrypoints ​

From PoC to Production

Staged Acceptance

Minimal Enterprise Topology

Pre-Launch Checklist

What Can Wait

Playbook Entrypoints