Key takeaways
- Pick the orchestration shape first, the framework second. The five shapes that matter in 2026 are graph, roles, handoffs, typed-IO, and enterprise runtime. The shape decides how your team will think about every future change.
- For a small team shipping its first production agent, the safest default is LangGraph: explicit graph nodes, durable checkpoints, and human-in-the-loop pauses are exactly the affordances you reach for the second a production agent misbehaves.
- If the goal is a working multi-agent demo by Friday and not a load-bearing production system, CrewAI’s role-based DSL is the fastest path. Plan to either upgrade to Flows for control flow or migrate later when the demo grows up.
- The OpenAI Agents SDK is the cleanest fit if you are already standardised on OpenAI models and want first-party handoffs, guardrails, and tracing without pulling in LangChain. The Anthropic Agent SDK plays the equivalent role inside the Claude stack.
- Pydantic AI rewards teams that already write typed Python and want agents that look like ordinary functions; Microsoft Agent Framework rewards teams already inside Azure with .NET and Python parity.
- The framework you regret is rarely the one with the wrong logo. It is the one whose control flow you can no longer read after eight weeks of feature creep. Optimise for legibility over cleverness.
Why this decision matters more than it looks
The agent framework you pick is not just a library. It is the abstraction your team will think in for every future feature: every retry, every tool call, every “why did the agent do that?” debugging session goes through it. Switching mid-project is expensive not because the rewrite is hard, but because every existing pattern, every observability hook, and every team-internal heuristic has to be re-learned. Pick the shape you can live with for a year.
The good news for small teams in 2026 is that the category has settled into five clear shapes. Once you know which shape fits your team’s default mental model — graphs, roles, handoffs, typed schemas, or an enterprise runtime — the framework choice inside that shape is almost mechanical.
What an agent framework actually has to give you
Strip the vendor decks and the job is to provide five things:
- A control-flow primitive. The thing you reach for to express “first do A, then if B is true do C otherwise loop”. This is a directed graph, a role with tasks, an agent with handoffs, or a typed function — pick the one your team writes the cleanest pseudocode in.
- A durable state layer. Long-running agents fail mid-run. The framework should be able to pause, persist, and resume without your code having to know it crashed. Anthropic’s pattern catalogue is explicit that this is the cost of admitting non-trivial agent loops into production.
- Tool integration. Function calling, MCP servers, retrieval, computer use. Anything that touches the outside world goes through here, with a uniform error surface.
- Tracing and evals. If you cannot see what the agent did and score it against a golden dataset, you cannot ship safely. Frameworks differ wildly on whether tracing is first-party (OpenAI Agents SDK, LangGraph via LangSmith) or BYO.
- A human-in-the-loop affordance. An explicit way to pause for approval before an irreversible action. This is the most common production retrofit; pick a framework where the affordance already exists.
A framework that does only the first two well is a prototype scaffold, not a production foundation. The frameworks below all do all five — but with very different defaults.
The 2026 framework matrix
- Framework — Shape — Model lock-in — Durable state — Sweet spot for a small team
- LangGraph — Graph (explicit nodes and edges) — Model-agnostic — First-party checkpointing and durable execution; HITL pause built in — Production agents that need audit trails, retries, and human approval gates
- CrewAI — Roles (Crews) plus Flows for control — Model-agnostic — Flows provide state and event-driven execution; Crews are stateless per task — Demos and prototypes where role-based decomposition is the most natural framing
- OpenAI Agents SDK — Handoffs between agents — OpenAI-first; community shims exist — Sessions layer with SQLAlchemy, SQLite, Redis, and other backends — Teams already on OpenAI who want first-party tracing and guardrails without LangChain
- Anthropic Agent SDK — Tool-use loop, optionally with sub-agents — Claude-first — Application-managed; framework is intentionally thin — Teams already on Claude who want minimal abstractions over the tool-use loop
- Pydantic AI — Typed function calls with validated inputs and outputs — Model-agnostic — Application-managed; integrates with existing Python persistence — Typed Python codebases where agents should look like ordinary, testable functions
- Microsoft Agent Framework — Enterprise runtime with .NET and Python parity — Model-agnostic; first-class Azure AI Foundry — Built-in workflow durability and observability — Teams inside Azure or .NET who want first-party tracing, policy, and identity
Two non-obvious entries deserve flagging. The Anthropic Agent SDK is intentionally minimal — it leans on the “build simple things first” argument from Anthropic’s own engineering write-up, so if you are looking for a framework that does a lot for you, it is not that. And the OpenAI Agents SDK’s “sessions” layer is the production-shaped affordance that the rest of the SDK relies on; if you skip it your agent has no memory across calls.
A decision checklist that fits on one page
- If your situation is… — Start with… — Reason
- First production agent, small team, want a clear audit trail — LangGraph — Explicit graph nodes plus checkpoints are the easiest abstraction to debug six months in. Human-in-the-loop is a first-class concept, not a retrofit.
- Demo or prototype on a tight deadline, multi-agent decomposition feels natural — CrewAI — The role-based DSL gets you to a working crew in an afternoon. Move to Flows for anything that needs deterministic control flow.
- Already standardised on OpenAI models, want first-party tracing — OpenAI Agents SDK — Handoffs, guardrails, sessions, and tracing all line up with the OpenAI platform tools; no third-party glue needed.
- Already standardised on Claude, want minimal abstractions — Anthropic Agent SDK — Stays close to the tool-use loop and the patterns from Anthropic’s “Building Effective Agents” write-up. Pay the cost in glue if you need durable state.
- Typed Python codebase, want agents that look like normal functions — Pydantic AI — Pydantic schemas at the boundary make agents testable like any other unit; you trade some framework affordances for legibility.
- On Azure and .NET, want first-party identity, policy, and observability — Microsoft Agent Framework — Multi-language runtime with first-party Azure AI Foundry integration; the natural pick when the enterprise stack is already chosen.
Mistakes to skip on the way
- Picking the framework before picking the shape. Two teams can use the same framework and end up with very different systems. The expensive choice is the shape — once you know whether your control flow is naturally a graph or a role or a handoff, the framework picks itself.
- Treating an autonomous agent as the default. Anthropic’s pattern catalogue is explicit that workflows — prompt chaining, routing, parallelisation, orchestrator-workers, evaluator-optimiser — cover most production use cases more reliably than a single open-ended agent. Default to a workflow; reach for an agent only when the task genuinely needs open-ended tool use.
- Skipping durable state until later. The first long-running agent that crashes in production is the moment you wish you had checkpointing. Wire it in week one; retrofitting it later means rewriting your control-flow graph.
- Conflating tracing and evals. Tracing tells you what happened; evals tell you whether it was correct. A framework that gives you only tracing is half a production stack. Pair every framework choice with an evaluation harness from day one.
- Locking your agent code to one model vendor by accident. Even when you have settled on OpenAI or Claude, keep the model-call site behind a thin adapter. The cost of changing models is mostly retesting; the cost of changing frameworks because you cannot change models is far higher.
Sources
- Anthropic Engineering — Building Effective Agents — used for the workflows-vs-agents distinction, the five workflow patterns (prompt chaining, routing, parallelisation, orchestrator-workers, evaluator-optimiser) plus the autonomous-agent pattern, and the “simple, composable patterns over complex frameworks” framing that runs through this guide.
- LangGraph official documentation — LangChain docs — used for the LangGraph row in the matrix: durable execution that persists through failures, human-in-the-loop as a first-class concept, short-term and long-term memory, and observability via LangSmith.
- OpenAI Agents SDK — openai/openai-agents-python on GitHub — used for the OpenAI Agents SDK row: handoffs as the coordination primitive, guardrails for input/output validation, built-in tracing, and the sessions persistence layer with SQLAlchemy, SQLite, Redis, MongoDB, and Dapr backends.
- CrewAI official documentation — used for the CrewAI row: Flows as the backbone for state management and event-driven control flow, Crews as collaborating role-playing agents, and the standalone-framework positioning (no LangChain dependency).
Related reading
- AI Agent Observability for Small Teams in 2026: A Practical Buyer’s Guide
- Agent Harnesses for Coding Agents in Small Teams (2026)
- Prompt Caching for Production LLM Apps in 2026: An Honest Cost-Control Playbook
- Spec-Driven Development for Small Teams in 2026 — When It Pays Off, When It’s Overkill