Skip to content

From PoC to Production

Languages: English · 中文

An agent prototype often proves that the model can answer once. Production asks a different question: can the team make the same capability callable, reviewable, recoverable, observable, and safe to change?

Staged Acceptance

StageGoalAcceptance questionDocs
0. Minimal prototypeRun model and business inputIs model setup reproducible and failure understandable?Quickstart, Model Setup
1. Stable outputMake results consumable by systemsAre fields, types, required checks, validation, and retry explicit?Output Control
2. Streaming experienceLet UI/services see progress earlyIs instant stream used for temporary UI while final get_data() writes durable state?Model Response
3. Service exposureExpose HTTP / SSE / WebSocketDoes the entry layer only validate, call, and project results? Is it async-first?Async First, FastAPI
4. External actionCall business functions, MCP, search, browse, or toolsAre schema, visible scope, logs, errors, and audit clear?Actions Overview
5. Controlled environmentHost MCP servers, scripts, or sandbox dependenciesAre lifecycle, permission, health, and failure semantics owned?Execution Environment
6. Long workflowHandle branches, concurrency, approval, waits, recoveryCan external systems use execution id, runtime stream, pause/resume, close snapshot?TriggerFlow
7. Multi-role collaborationLet specialist roles work on one taskAre role duties, Sub-Flow boundaries, Workspace topology, and aggregate fields clear?Multi-Role Collaboration
8. Interaction layerShow process to users, frontends, or IMAre product events, channels, waiting states, and final state separate?Interaction Layer and Active Tasks
9. Proactive tasksLet timers, webhooks, queues, or workers move tasksDoes scheduler / queue / worker own trigger and idempotency?Interaction Layer and Active Tasks
10. State and evidenceKeep artifacts, observations, decisions, checkpointsAre large artifacts in Workspace and execution state limited to refs?Workspace, Long-Running State
11. Observability and evalProve runtime facts and quality changesAre RuntimeEvent, DevTools, eval cases, or replay evidence in place?Observability, Production Governance

Minimal Enterprise Topology

ModuleOwns
Gateway / APIAuth, tenant, request validation, HTTP/SSE/WebSocket
Agent CoreAgent definition, AgentExecution, prompt and output contracts
Tool AdaptersBusiness APIs, MCP clients, local functions, search, browse
Execution EnvironmentMCP servers, scripts, databases, browsers, sandboxes
Flow OrchestratorTriggerFlow definitions, executions, runtime stream, pause/resume
Interaction LayerProduct process events, channel gateways, waiting hints, final notifications
Scheduler / WorkerTimers, webhooks, queues, idempotency, proactive task progress
Workspace StoreObservations, artifacts, decisions, checkpoints, ContextPack source
Eval / ObserverRuntimeEvent, DevTools, eval cases, replay, quality judgment

Early systems can run these modules in one process. Keeping the responsibilities separate makes later service splits much easier.

Pre-Launch Checklist

CheckPassing standard
Output contractBusiness systems consume structured data, not natural-language fragments
Streaming boundaryInstant stream is temporary UI/progress state; durable writes use final data
Action boundaryCallable capabilities are registered through Actions / MCP / built-in Actions and leave records
Environment boundaryScripts, MCP servers, browsers, SQLite, Node.js, and sandboxes have lifecycle owners
Permissions and approvalsHigh-risk or irreversible actions have policy, approval, or fail-closed paths
Long workflowWaits, recovery, and human intervention have execution handles and interrupt semantics
Multi-role collaborationRole agents, Sub-Flow, Workspace topology, and aggregate fields are explicit
Interaction layerUI/IM consumes product events, not parser paths, chunk names, or RuntimeEvent payloads
Proactive tasksScheduler / webhook / queue is separated from HTTP request handlers and has idempotency
StateLarge artifacts, downloads, reports, and logs stay out of prompt and execution state
RecoveryCheckpoint, save/load, runtime resource reinjection, and external resume entry are clear
ObservabilityRequests, Actions, ExecutionEnvironment, and TriggerFlow events can be traced
RegressionRepresentative inputs, output contract checks, business rules, or model judges are repeatable
GovernanceCost, timeout, retry, idempotency, safety, and audit each have an owner
UpdatesRelease notes, examples, website docs, and Agently-Skills guidance can be updated together

What Can Wait

CapabilityWhen it can wait
TriggerFlowThe task is a single request with no branch, concurrency, wait, recovery, or process stream
Multi-role collaborationOne AgentExecution already returns a stable structured result
Proactive taskWork is only triggered by an immediate user request
Dynamic TaskThe task graph is not submitted as data by a model or business system
WorkspaceThere is no cross-turn evidence, artifact, checkpoint, or long-term context
Execution EnvironmentThe system only calls ordinary local functions or already-hosted APIs
DevTools / EvalA very early PoC can start with logs and human review, but production needs runtime evidence

Playbook Entrypoints