Build Multi-Agent Systems in Python: CrewAI vs LangGraph vs AutoGen
A single AI agent can answer questions and call tools. But when you need one agent to research, another to analyze, and a third to write — you need multi-agent orchestration. Gartner reported a 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025. This isn't hype. It's the AI equivalent of the microservices revolution.
This guide walks you through building multi-agent systems with the three dominant Python frameworks in 2026: CrewAI, LangGraph, and AutoGen. You'll get runnable code, architecture patterns, and an honest comparison to help you pick the right tool.
📋 What You'll Need
- Python 3.10+ installed
- An OpenAI API key (or Anthropic/other LLM provider)
- pip or uv for package management
- Basic Python knowledge — classes, decorators, async/await
- A use case in mind — even a simple one like "research a topic and write a summary"
🧠 When Do You Actually Need Multiple Agents?
Not every problem needs a multi-agent system. Microsoft's Azure Architecture Center defines three levels of complexity:
| Level | What It Is | When to Use |
|---|---|---|
| Direct model call | Single LLM call with a good prompt | Classification, summarization, translation |
| Single agent + tools | One agent that reasons and selects tools | Most domain-specific tasks (order lookup, DB queries) |
| Multi-agent orchestration | Specialized agents coordinating together | Cross-domain problems, parallel specialization, security boundaries |
Start with a single agent and good prompt engineering. Add tools before adding agents. Graduate to multi-agent patterns only when you hit clear limits — like prompt complexity blowing up, tool overload, or needing distinct security boundaries per domain.
🏗️ Five Orchestration Patterns You Should Know
Before picking a framework, understand the patterns. Every multi-agent system uses one (or a combination) of these:
Sequential (Pipeline)
Agents process tasks in a fixed order. Agent 1's output feeds Agent 2, and so on. Think: draft → review → polish.
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Research │────►│ Analyze │────►│ Write │
└──────────┘ └──────────┘ └──────────┘
Best for: Progressive refinement workflows with clear dependencies.
Concurrent (Fan-out/Fan-in)
Multiple agents process the same input simultaneously. Results get aggregated at the end. Think: four analysts evaluating the same stock from different angles.
Best for: Tasks that benefit from multiple perspectives or time-sensitive parallel processing.
Supervisor (Hierarchical)
A boss agent decides which worker agents to invoke and when. The supervisor plans, delegates, and aggregates results.
Best for: Complex workflows where the execution path depends on intermediate results.
Handoff (State-Driven)
The active agent changes dynamically. Each agent can transfer control to another via tool calling. Think: customer support escalation.
Best for: Sequential workflows with staged constraints — like triage → diagnosis → resolution.
Group Chat (Roundtable)
Multiple agents participate in a shared conversation thread. A chat manager coordinates who speaks next.
Best for: Brainstorming, consensus-building, or maker-checker validation loops.
🚀 CrewAI: The Fastest Path to Multi-Agent Systems
CrewAI uses a role-based model inspired by real teams. You define agents like job descriptions — each has a role, goal, and backstory. It's the most beginner-friendly framework and gets you to a working system in under an hour.
Install and Set Up
pip install crewai 'crewai[tools]'
Set your API key:
export OPENAI_API_KEY="sk-your-key-here"
Build a Research + Writing Crew
from crewai import Agent, Task, Crew
# Define specialized agents
researcher = Agent(
role="Senior Research Analyst",
goal="Find accurate, current information on the given topic",
backstory="You specialize in discovering key facts from "
"multiple sources and synthesizing them clearly.",
allow_delegation=False,
)
writer = Agent(
role="Technical Writer",
goal="Create clear, actionable content from research findings",
backstory="You translate complex technical data into "
"well-structured articles that developers love.",
allow_delegation=False,
)
# Define tasks
research_task = Task(
description="Research the current state of {topic}. "
"Find key players, pricing, and recent developments.",
expected_output="A structured summary with 5-7 key findings",
agent=researcher,
)
writing_task = Task(
description="Write a concise technical overview based on "
"the research findings. Include a comparison table.",
expected_output="A 500-word article with headers and a table",
agent=writer,
)
# Create and run the crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
verbose=True,
)
result = crew.kickoff(inputs={"topic": "multi-agent AI frameworks"})
print(result)
Adding Custom Tools
from crewai.tools import tool
@tool("Search the web")
def web_search(query: str) -> str:
"""Searches the web and returns top results."""
# Replace with your preferred search API
import requests
response = requests.get(
"https://api.search.example/v1/search",
params={"q": query},
)
return response.json()["results"][:5]
# Attach the tool to an agent
researcher = Agent(
role="Senior Research Analyst",
goal="Find accurate information using web search",
backstory="Expert at finding and synthesizing information.",
tools=[web_search],
allow_delegation=False,
)
CrewAI Key Config Options
| Option | Purpose | Default |
|---|---|---|
allow_delegation |
Let agent ask other agents for help | True |
verbose |
Detailed logging of agent actions | False |
max_iter |
Maximum reasoning iterations | 25 |
process |
Process.sequential or Process.hierarchical |
sequential |
max_rpm |
API rate limit per minute | None |
🔷 LangGraph: Production-Grade Graph Orchestration
LangGraph models agent workflows as directed graphs. Nodes are agents or actions. Edges are transitions. It's the most powerful framework for complex workflows — and the steepest learning curve.
LangGraph benchmarks show 30-40% lower latency than alternatives for complex workflows, thanks to efficient parallel execution within its graph structure.
Install
pip install langgraph langchain-openai
Build a Research Pipeline with State
from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
# Define shared state
class PipelineState(TypedDict):
topic: str
research: str
critique: str
final_draft: str
revision_count: int
llm = ChatOpenAI(model="gpt-4o")
# Define agent nodes
def researcher(state: PipelineState) -> dict:
response = llm.invoke(
f"Research this topic thoroughly: {state['topic']}. "
f"Provide 5-7 key findings with sources."
)
return {"research": response.content}
def writer(state: PipelineState) -> dict:
critique = state.get("critique", "No previous feedback.")
response = llm.invoke(
f"Write a technical overview based on this research:\n"
f"{state['research']}\n\n"
f"Previous feedback to address: {critique}"
)
return {"final_draft": response.content}
def critic(state: PipelineState) -> dict:
response = llm.invoke(
f"Review this draft for accuracy and clarity:\n"
f"{state['final_draft']}\n\n"
f"List specific improvements needed, or say APPROVED."
)
return {
"critique": response.content,
"revision_count": state.get("revision_count", 0) + 1,
}
# Routing logic
def should_revise(state: PipelineState) -> str:
if state.get("revision_count", 0) >= 3:
return "done"
if "APPROVED" in state.get("critique", ""):
return "done"
return "revise"
# Build the graph
graph = StateGraph(PipelineState)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_node("critic", critic)
graph.set_entry_point("researcher")
graph.add_edge("researcher", "writer")
graph.add_edge("writer", "critic")
graph.add_conditional_edges("critic", should_revise, {
"revise": "writer",
"done": END,
})
pipeline = graph.compile()
# Run it
result = pipeline.invoke({"topic": "multi-agent systems", "revision_count": 0})
print(result["final_draft"])
The key advantage here: state is explicit and inspectable. At any point you can see exactly what's in PipelineState. When a run fails, LangSmith lets you replay it with modified inputs directly from the UI.
┌────────────┐ ┌────────┐ ┌────────┐
│ Researcher │────►│ Writer │────►│ Critic │
└────────────┘ └────────┘ └───┬────┘
▲ │
│ revise │
└──────────────┘
│ approved
▼
[END]
max_revisions limit.
🤖 AutoGen: Conversation-First Multi-Agent Systems
AutoGen (by Microsoft) treats agent coordination as a conversation. Agents talk to each other in a chat thread, with different selection strategies determining who speaks next. It's the most natural fit for debate, review, and interactive Q&A systems.
Install
pip install autogen-agentchat autogen-ext[openai]
Build a Conversational Agent Team
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient
model = OpenAIChatCompletionClient(model="gpt-4o")
researcher = AssistantAgent(
"researcher",
model_client=model,
system_message="You are a research analyst. Investigate topics "
"thoroughly and present key findings.",
)
critic = AssistantAgent(
"critic",
model_client=model,
system_message="You review research for accuracy and gaps. "
"Say APPROVED when the research is solid.",
)
termination = TextMentionTermination("APPROVED")
team = RoundRobinGroupChat(
[researcher, critic],
termination_condition=termination,
max_turns=6,
)
async def main():
result = await team.run(
task="Research the current state of multi-agent AI frameworks"
)
print(result)
asyncio.run(main())
The AutoGen Gotcha
AutoGen's speaker_selection_method='auto' uses an LLM call to decide who speaks next. In theory, that sounds smart. In practice, it's unpredictable — agents can talk past each other or get stuck in loops. Use round_robin for predictable flows, or implement custom selection logic.
⚖️ Framework Comparison: CrewAI vs LangGraph vs AutoGen
Here's a head-to-head comparison based on real-world testing across the same pipeline (researcher → writer → critic with revision loops):
| Aspect | CrewAI | LangGraph | AutoGen |
|---|---|---|---|
| Mental model | Job descriptions | Directed graph | Conversation thread |
| Learning curve | 🟢 Easy | 🔴 Steep | 🟡 Moderate |
| Time to first agent | 🥇 ~30 min | 🥉 ~2 hours | 🥈 ~1 hour |
| Cycle/loop support | ⚠️ Workarounds needed | ✅ Native | ⚠️ Difficult |
| State management | Implicit | ✅ Explicit & typed | Implicit |
| Parallel execution | ⚠️ Limited | ✅ Native | ⚠️ Limited |
| Debugging | Verbose logging | ✅ LangSmith (excellent) | AutoGen Studio |
| Production readiness | 🟡 Medium | 🟢 High | 🟡 Medium |
| MCP support | ✅ Yes | ✅ Yes | ✅ Yes |
| Community (GitHub stars) | ~25k | ~15k | ~40k |
| Active development | 🟢 Very active | 🟢 Very active | 🟡 Maintenance mode |
Which Framework Should You Pick?
Visualize your workflow. That tells you which framework fits:
- Looks like a flowchart with loops? → LangGraph
- Looks like a conversation thread? → AutoGen
- Looks like a job description board? → CrewAI
For most teams starting out, CrewAI gets you to production 40% faster than LangGraph for standard workflows. But if you need cycles, conditional branching, or explicit state management, LangGraph is worth the investment.
🔧 Practical Tips for Production Multi-Agent Systems
These lessons come from developers debugging agent failures weekly:
Set Revision Caps
Any loop between agents needs a hard limit. Without one, a critic agent can send a writer into 10+ revision cycles, burning tokens and time.
# LangGraph: cap in routing logic
def should_revise(state):
if state["revision_count"] >= 3:
return "done" # Force exit after 3 revisions
Validate Agent Outputs
The orchestrator should check output quality before passing it downstream. A bad research summary propagates errors through the entire pipeline.
Keep Agent Prompts Focused
Each agent should have a narrow, specific role. An agent that "researches AND writes AND reviews" is just a single agent with extra steps. The whole point is specialization.
Monitor Costs
Multi-agent systems multiply your API costs. Every agent call is at least one LLM invocation, and revision loops amplify this. Track token usage per agent per run.
| Pipeline | Agents | Avg. API Cost/Run |
|---|---|---|
| Simple (research → write) | 2 | ~$0.15 |
| With critic loop (1-2 revisions) | 3 | ~$0.80 |
| With critic loop (runaway) | 3 | ~$4.00+ |
Use MCP for Tool Connectivity
Model Context Protocol (MCP) provides a universal interface for connecting agents to external tools. All three frameworks support it. Instead of writing custom tool integrations per framework, build one MCP server and connect any agent to it.
🔮 What's Next
- 🛠️ Build your first crew — Start with CrewAI and a simple two-agent pipeline. Pick something boring like "research and summarize a topic."
- 📊 Add observability — Integrate LangSmith or similar tracing to debug agent behavior before it costs you.
- 🔌 Learn MCP — Read our guide on MCP Servers: Model Context Protocol Explained to build universal tool connections.
- 🏗️ Try graph orchestration — Once comfortable with sequential agents, rebuild your pipeline in LangGraph with conditional edges and revision loops.
- 🔒 Add guardrails — Implement output validation, cost limits, and iteration caps before going to production.
Want to see how AI agents work in coding workflows? Check out our AI Coding Agents Compared and Deterministic Agentic Workflows Guide.