Enterprise Agent Evaluation
Languages: English · 中文
This page is not an API catalog. It helps a team decide whether an AI agent prototype can enter a real business system.
Website readers usually bring three different questions:
| Role | What they care about | Evidence they need |
|---|---|---|
| Business owner | Whether the agent can deliver reviewable business results | Output fields, process steps, exception handling, and final artifacts |
| Technical lead | Whether the prototype can fit existing systems | API boundary, permissions, logs, state, service exposure, and operations hooks |
| Developer | Where to start and when to add more capabilities | Recommended APIs, capability boundaries, and staged acceptance checks |
Agently is useful when these engineering concerns need to stay on one delivery path instead of being patched around a prompt.
Six Evaluation Questions
| Question | Agently entrypoint | Passing signal |
|---|---|---|
| Can model results enter a business system? | Output Control, Model Response | Fields, types, required checks, validation, and retry behavior are explicit; writes use get_data() / async_get_data() rather than ad hoc natural-language parsing |
| Can the UI or service show progress before the final answer? | Instant structured streaming | get_generator(type="instant") / get_async_generator(type="instant") is used for temporary UI state; final writes still use get_data() |
| Can the model call external capabilities safely? | Actions, MCP | Tool schema, visible tool scope, action records, error shape, and audit position are clear |
| Is the execution environment controlled? | Execution Environment | MCP servers, scripts, SQLite, Node.js, browser sessions, or sandboxes have lifecycle, permission, and health-check owners |
| Can long workflows be watched, paused, and resumed? | TriggerFlow | Branching, fan-out, sub-flow, runtime stream, pause/resume, close snapshot, and execution status are traceable |
| Can cross-turn evidence be retained and recalled? | Workspace, Context Engineering | Observations, artifacts, decisions, and checkpoints live in Workspace; execution state keeps refs rather than large blobs |
Avoid Premature Upgrades
Many agent projects fail because the layer is wrong, not because the model is weak.
| Symptom | Better judgment |
|---|---|
| Designing a complex workflow while output fields still drift | First stabilize output schema, ensure, validation, and result readers |
| Starting with TriggerFlow for a single turn | Stay in AgentExecution / request layer first |
| Using Dynamic Task when the task graph is not an input | Use Dynamic Task only when a model or business system submits DAG data |
| Treating MCP as a complete permission system | MCP standardizes connection and capability supply; Host / Action Runtime still owns visibility, identity, redaction, and audit |
| Treating a Skill as a script runner | Skills are selectable behavior assets; execution should map back to Actions, ExecutionEnvironment, TriggerFlow, or Dynamic Task |
A Practical Validation Path
- Use Quickstart to run a minimal structured request.
- Use Output Control to fix fields, required checks, and business validation.
- Use Model Response to confirm text / data / meta / stream are read from the same response.
- Use Actions Overview to connect a real or mocked business capability and inspect action records.
- Add Execution Environment when a capability needs a managed process, sandbox, or external dependency.
- Add TriggerFlow when the flow has branches, concurrency, approvals, waits, recovery, or process visibility.
- Add Workspace when the task needs cross-turn evidence, artifacts, and checkpoints.
- Before service delivery, cover FastAPI Service Exposure and Observability.
Use Scenarios for the First Decision
| Scenario | First capability layer | May upgrade to |
|---|---|---|
| Customer ticket triage, intent detection, form extraction | Output Control + Model Response | Instant stream, Actions |
| Industry news, research reports, document collection | Structured output + TriggerFlow | Workspace, runtime stream |
| Natural-language control of business systems | Output schema + Actions | ExecutionEnvironment, approval, TriggerFlow |
| Long document review, contract analysis, policy checking | Routing + task/dependency plan + reflection | TriggerFlow, Workspace, human approval |
| Reusable capability package selection | Skills Executor | Dynamic Task, TriggerFlow-backed staged/react execution |
After this page, read Enterprise Agent System Roadmap for architecture planning, or Scenario-to-Capability Mapping if you already have a business goal.