From PoC to Production
Languages: English · 中文
An agent prototype often proves that the model can answer once. Production asks a different question: can the team make the same capability callable, reviewable, recoverable, observable, and safe to change?
Staged Acceptance
| Stage | Goal | Acceptance question | Docs |
|---|---|---|---|
| 0. Minimal prototype | Run model and business input | Is model setup reproducible and failure understandable? | Quickstart, Model Setup |
| 1. Stable output | Make results consumable by systems | Are fields, types, required checks, validation, and retry explicit? | Output Control |
| 2. Streaming experience | Let UI/services see progress early | Is instant stream used for temporary UI while final get_data() writes durable state? | Model Response |
| 3. Service exposure | Expose HTTP / SSE / WebSocket | Does the entry layer only validate, call, and project results? Is it async-first? | Async First, FastAPI |
| 4. External action | Call business functions, MCP, search, browse, or tools | Are schema, visible scope, logs, errors, and audit clear? | Actions Overview |
| 5. Controlled environment | Host MCP servers, scripts, or sandbox dependencies | Are lifecycle, permission, health, and failure semantics owned? | Execution Environment |
| 6. Long workflow | Handle branches, concurrency, approval, waits, recovery | Can external systems use execution id, runtime stream, pause/resume, close snapshot? | TriggerFlow |
| 7. Multi-role collaboration | Let specialist roles work on one task | Are role duties, Sub-Flow boundaries, Workspace topology, and aggregate fields clear? | Multi-Role Collaboration |
| 8. Interaction layer | Show process to users, frontends, or IM | Are product events, channels, waiting states, and final state separate? | Interaction Layer and Active Tasks |
| 9. Proactive tasks | Let timers, webhooks, queues, or workers move tasks | Does scheduler / queue / worker own trigger and idempotency? | Interaction Layer and Active Tasks |
| 10. State and evidence | Keep artifacts, observations, decisions, checkpoints | Are large artifacts in Workspace and execution state limited to refs? | Workspace, Long-Running State |
| 11. Observability and eval | Prove runtime facts and quality changes | Are RuntimeEvent, DevTools, eval cases, or replay evidence in place? | Observability, Production Governance |
Minimal Enterprise Topology
| Module | Owns |
|---|---|
| Gateway / API | Auth, tenant, request validation, HTTP/SSE/WebSocket |
| Agent Core | Agent definition, AgentExecution, prompt and output contracts |
| Tool Adapters | Business APIs, MCP clients, local functions, search, browse |
| Execution Environment | MCP servers, scripts, databases, browsers, sandboxes |
| Flow Orchestrator | TriggerFlow definitions, executions, runtime stream, pause/resume |
| Interaction Layer | Product process events, channel gateways, waiting hints, final notifications |
| Scheduler / Worker | Timers, webhooks, queues, idempotency, proactive task progress |
| Workspace Store | Observations, artifacts, decisions, checkpoints, ContextPack source |
| Eval / Observer | RuntimeEvent, DevTools, eval cases, replay, quality judgment |
Early systems can run these modules in one process. Keeping the responsibilities separate makes later service splits much easier.
Pre-Launch Checklist
| Check | Passing standard |
|---|---|
| Output contract | Business systems consume structured data, not natural-language fragments |
| Streaming boundary | Instant stream is temporary UI/progress state; durable writes use final data |
| Action boundary | Callable capabilities are registered through Actions / MCP / built-in Actions and leave records |
| Environment boundary | Scripts, MCP servers, browsers, SQLite, Node.js, and sandboxes have lifecycle owners |
| Permissions and approvals | High-risk or irreversible actions have policy, approval, or fail-closed paths |
| Long workflow | Waits, recovery, and human intervention have execution handles and interrupt semantics |
| Multi-role collaboration | Role agents, Sub-Flow, Workspace topology, and aggregate fields are explicit |
| Interaction layer | UI/IM consumes product events, not parser paths, chunk names, or RuntimeEvent payloads |
| Proactive tasks | Scheduler / webhook / queue is separated from HTTP request handlers and has idempotency |
| State | Large artifacts, downloads, reports, and logs stay out of prompt and execution state |
| Recovery | Checkpoint, save/load, runtime resource reinjection, and external resume entry are clear |
| Observability | Requests, Actions, ExecutionEnvironment, and TriggerFlow events can be traced |
| Regression | Representative inputs, output contract checks, business rules, or model judges are repeatable |
| Governance | Cost, timeout, retry, idempotency, safety, and audit each have an owner |
| Updates | Release notes, examples, website docs, and Agently-Skills guidance can be updated together |
What Can Wait
| Capability | When it can wait |
|---|---|
| TriggerFlow | The task is a single request with no branch, concurrency, wait, recovery, or process stream |
| Multi-role collaboration | One AgentExecution already returns a stable structured result |
| Proactive task | Work is only triggered by an immediate user request |
| Dynamic Task | The task graph is not submitted as data by a model or business system |
| Workspace | There is no cross-turn evidence, artifact, checkpoint, or long-term context |
| Execution Environment | The system only calls ordinary local functions or already-hosted APIs |
| DevTools / Eval | A very early PoC can start with logs and human review, but production needs runtime evidence |