Build Multi-Agent Systems in Python: CrewAI vs LangGraph vs AutoGen

 

A single AI agent can answer questions and call tools. But when you need one agent to research, another to analyze, and a third to write — you need multi-agent orchestration. Gartner reported a 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025. This isn't hype. It's the AI equivalent of the microservices revolution.

This guide walks you through building multi-agent systems with the three dominant Python frameworks in 2026: CrewAI, LangGraph, and AutoGen. You'll get runnable code, architecture patterns, and an honest comparison to help you pick the right tool.


📋 What You'll Need

  • Python 3.10+ installed
  • An OpenAI API key (or Anthropic/other LLM provider)
  • pip or uv for package management
  • Basic Python knowledge — classes, decorators, async/await
  • A use case in mind — even a simple one like "research a topic and write a summary"

🧠 When Do You Actually Need Multiple Agents?

Not every problem needs a multi-agent system. Microsoft's Azure Architecture Center defines three levels of complexity:

Level What It Is When to Use
Direct model call Single LLM call with a good prompt Classification, summarization, translation
Single agent + tools One agent that reasons and selects tools Most domain-specific tasks (order lookup, DB queries)
Multi-agent orchestration Specialized agents coordinating together Cross-domain problems, parallel specialization, security boundaries

Start with a single agent and good prompt engineering. Add tools before adding agents. Graduate to multi-agent patterns only when you hit clear limits — like prompt complexity blowing up, tool overload, or needing distinct security boundaries per domain.

Tip: If your single agent's system prompt exceeds 2,000 tokens of instructions, that's a signal to split into specialized agents.

🏗️ Five Orchestration Patterns You Should Know

Before picking a framework, understand the patterns. Every multi-agent system uses one (or a combination) of these:

Sequential (Pipeline)

Agents process tasks in a fixed order. Agent 1's output feeds Agent 2, and so on. Think: draft → review → polish.

┌──────────┐     ┌──────────┐     ┌──────────┐
│ Research  │────►│ Analyze  │────►│  Write   │
└──────────┘     └──────────┘     └──────────┘

Best for: Progressive refinement workflows with clear dependencies.

Concurrent (Fan-out/Fan-in)

Multiple agents process the same input simultaneously. Results get aggregated at the end. Think: four analysts evaluating the same stock from different angles.

Best for: Tasks that benefit from multiple perspectives or time-sensitive parallel processing.

Supervisor (Hierarchical)

A boss agent decides which worker agents to invoke and when. The supervisor plans, delegates, and aggregates results.

Best for: Complex workflows where the execution path depends on intermediate results.

Handoff (State-Driven)

The active agent changes dynamically. Each agent can transfer control to another via tool calling. Think: customer support escalation.

Best for: Sequential workflows with staged constraints — like triage → diagnosis → resolution.

Group Chat (Roundtable)

Multiple agents participate in a shared conversation thread. A chat manager coordinates who speaks next.

Best for: Brainstorming, consensus-building, or maker-checker validation loops.


🚀 CrewAI: The Fastest Path to Multi-Agent Systems

CrewAI uses a role-based model inspired by real teams. You define agents like job descriptions — each has a role, goal, and backstory. It's the most beginner-friendly framework and gets you to a working system in under an hour.

Install and Set Up

pip install crewai 'crewai[tools]'

Set your API key:

export OPENAI_API_KEY="sk-your-key-here"

Build a Research + Writing Crew

from crewai import Agent, Task, Crew

# Define specialized agents
researcher = Agent(
    role="Senior Research Analyst",
    goal="Find accurate, current information on the given topic",
    backstory="You specialize in discovering key facts from "
              "multiple sources and synthesizing them clearly.",
    allow_delegation=False,
)

writer = Agent(
    role="Technical Writer",
    goal="Create clear, actionable content from research findings",
    backstory="You translate complex technical data into "
              "well-structured articles that developers love.",
    allow_delegation=False,
)

# Define tasks
research_task = Task(
    description="Research the current state of {topic}. "
                "Find key players, pricing, and recent developments.",
    expected_output="A structured summary with 5-7 key findings",
    agent=researcher,
)

writing_task = Task(
    description="Write a concise technical overview based on "
                "the research findings. Include a comparison table.",
    expected_output="A 500-word article with headers and a table",
    agent=writer,
)

# Create and run the crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    verbose=True,
)

result = crew.kickoff(inputs={"topic": "multi-agent AI frameworks"})
print(result)

Adding Custom Tools

from crewai.tools import tool

@tool("Search the web")
def web_search(query: str) -> str:
    """Searches the web and returns top results."""
    # Replace with your preferred search API
    import requests
    response = requests.get(
        "https://api.search.example/v1/search",
        params={"q": query},
    )
    return response.json()["results"][:5]

# Attach the tool to an agent
researcher = Agent(
    role="Senior Research Analyst",
    goal="Find accurate information using web search",
    backstory="Expert at finding and synthesizing information.",
    tools=[web_search],
    allow_delegation=False,
)

CrewAI Key Config Options

Option Purpose Default
allow_delegation Let agent ask other agents for help True
verbose Detailed logging of agent actions False
max_iter Maximum reasoning iterations 25
process Process.sequential or Process.hierarchical sequential
max_rpm API rate limit per minute None

🔷 LangGraph: Production-Grade Graph Orchestration

LangGraph models agent workflows as directed graphs. Nodes are agents or actions. Edges are transitions. It's the most powerful framework for complex workflows — and the steepest learning curve.

LangGraph benchmarks show 30-40% lower latency than alternatives for complex workflows, thanks to efficient parallel execution within its graph structure.

Install

pip install langgraph langchain-openai

Build a Research Pipeline with State

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI

# Define shared state
class PipelineState(TypedDict):
    topic: str
    research: str
    critique: str
    final_draft: str
    revision_count: int

llm = ChatOpenAI(model="gpt-4o")

# Define agent nodes
def researcher(state: PipelineState) -> dict:
    response = llm.invoke(
        f"Research this topic thoroughly: {state['topic']}. "
        f"Provide 5-7 key findings with sources."
    )
    return {"research": response.content}

def writer(state: PipelineState) -> dict:
    critique = state.get("critique", "No previous feedback.")
    response = llm.invoke(
        f"Write a technical overview based on this research:\n"
        f"{state['research']}\n\n"
        f"Previous feedback to address: {critique}"
    )
    return {"final_draft": response.content}

def critic(state: PipelineState) -> dict:
    response = llm.invoke(
        f"Review this draft for accuracy and clarity:\n"
        f"{state['final_draft']}\n\n"
        f"List specific improvements needed, or say APPROVED."
    )
    return {
        "critique": response.content,
        "revision_count": state.get("revision_count", 0) + 1,
    }

# Routing logic
def should_revise(state: PipelineState) -> str:
    if state.get("revision_count", 0) >= 3:
        return "done"
    if "APPROVED" in state.get("critique", ""):
        return "done"
    return "revise"

# Build the graph
graph = StateGraph(PipelineState)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_node("critic", critic)

graph.set_entry_point("researcher")
graph.add_edge("researcher", "writer")
graph.add_edge("writer", "critic")
graph.add_conditional_edges("critic", should_revise, {
    "revise": "writer",
    "done": END,
})

pipeline = graph.compile()

# Run it
result = pipeline.invoke({"topic": "multi-agent systems", "revision_count": 0})
print(result["final_draft"])

The key advantage here: state is explicit and inspectable. At any point you can see exactly what's in PipelineState. When a run fails, LangSmith lets you replay it with modified inputs directly from the UI.

┌────────────┐     ┌────────┐     ┌────────┐
 Researcher │────►│ Writer │────►│ Critic 
└────────────┘     └────────┘     └───┬────┘
                                     
                          revise     
                       └──────────────┘
                               approved
                              
                           [END]
Warning: Without a revision cap, critique loops can spiral. One test run generated 11 revision cycles and burned $4 in API calls. Always set a max_revisions limit.

🤖 AutoGen: Conversation-First Multi-Agent Systems

AutoGen (by Microsoft) treats agent coordination as a conversation. Agents talk to each other in a chat thread, with different selection strategies determining who speaks next. It's the most natural fit for debate, review, and interactive Q&A systems.

Install

pip install autogen-agentchat autogen-ext[openai]

Build a Conversational Agent Team

import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient

model = OpenAIChatCompletionClient(model="gpt-4o")

researcher = AssistantAgent(
    "researcher",
    model_client=model,
    system_message="You are a research analyst. Investigate topics "
                   "thoroughly and present key findings.",
)

critic = AssistantAgent(
    "critic",
    model_client=model,
    system_message="You review research for accuracy and gaps. "
                   "Say APPROVED when the research is solid.",
)

termination = TextMentionTermination("APPROVED")

team = RoundRobinGroupChat(
    [researcher, critic],
    termination_condition=termination,
    max_turns=6,
)

async def main():
    result = await team.run(
        task="Research the current state of multi-agent AI frameworks"
    )
    print(result)

asyncio.run(main())

The AutoGen Gotcha

AutoGen's speaker_selection_method='auto' uses an LLM call to decide who speaks next. In theory, that sounds smart. In practice, it's unpredictable — agents can talk past each other or get stuck in loops. Use round_robin for predictable flows, or implement custom selection logic.

Important: Microsoft has shifted strategic focus to the broader Microsoft Agent Framework. AutoGen still gets bug fixes and security patches, but major new feature development has slowed. Factor this into long-term decisions.

⚖️ Framework Comparison: CrewAI vs LangGraph vs AutoGen

Here's a head-to-head comparison based on real-world testing across the same pipeline (researcher → writer → critic with revision loops):

Aspect CrewAI LangGraph AutoGen
Mental model Job descriptions Directed graph Conversation thread
Learning curve 🟢 Easy 🔴 Steep 🟡 Moderate
Time to first agent 🥇 ~30 min 🥉 ~2 hours 🥈 ~1 hour
Cycle/loop support ⚠️ Workarounds needed ✅ Native ⚠️ Difficult
State management Implicit ✅ Explicit & typed Implicit
Parallel execution ⚠️ Limited ✅ Native ⚠️ Limited
Debugging Verbose logging ✅ LangSmith (excellent) AutoGen Studio
Production readiness 🟡 Medium 🟢 High 🟡 Medium
MCP support ✅ Yes ✅ Yes ✅ Yes
Community (GitHub stars) ~25k ~15k ~40k
Active development 🟢 Very active 🟢 Very active 🟡 Maintenance mode

Which Framework Should You Pick?

Visualize your workflow. That tells you which framework fits:

  • Looks like a flowchart with loops?LangGraph
  • Looks like a conversation thread?AutoGen
  • Looks like a job description board?CrewAI

For most teams starting out, CrewAI gets you to production 40% faster than LangGraph for standard workflows. But if you need cycles, conditional branching, or explicit state management, LangGraph is worth the investment.


🔧 Practical Tips for Production Multi-Agent Systems

These lessons come from developers debugging agent failures weekly:

Set Revision Caps

Any loop between agents needs a hard limit. Without one, a critic agent can send a writer into 10+ revision cycles, burning tokens and time.

# LangGraph: cap in routing logic
def should_revise(state):
    if state["revision_count"] >= 3:
        return "done"  # Force exit after 3 revisions

Validate Agent Outputs

The orchestrator should check output quality before passing it downstream. A bad research summary propagates errors through the entire pipeline.

Keep Agent Prompts Focused

Each agent should have a narrow, specific role. An agent that "researches AND writes AND reviews" is just a single agent with extra steps. The whole point is specialization.

Monitor Costs

Multi-agent systems multiply your API costs. Every agent call is at least one LLM invocation, and revision loops amplify this. Track token usage per agent per run.

Pipeline Agents Avg. API Cost/Run
Simple (research → write) 2 ~$0.15
With critic loop (1-2 revisions) 3 ~$0.80
With critic loop (runaway) 3 ~$4.00+

Use MCP for Tool Connectivity

Model Context Protocol (MCP) provides a universal interface for connecting agents to external tools. All three frameworks support it. Instead of writing custom tool integrations per framework, build one MCP server and connect any agent to it.


🔮 What's Next

  • 🛠️ Build your first crew — Start with CrewAI and a simple two-agent pipeline. Pick something boring like "research and summarize a topic."
  • 📊 Add observability — Integrate LangSmith or similar tracing to debug agent behavior before it costs you.
  • 🔌 Learn MCP — Read our guide on MCP Servers: Model Context Protocol Explained to build universal tool connections.
  • 🏗️ Try graph orchestration — Once comfortable with sequential agents, rebuild your pipeline in LangGraph with conditional edges and revision loops.
  • 🔒 Add guardrails — Implement output validation, cost limits, and iteration caps before going to production.

Want to see how AI agents work in coding workflows? Check out our AI Coding Agents Compared and Deterministic Agentic Workflows Guide.





Thanks for feedback.

Share Your Thoughts




Read More....
AI Automation for Small Business: Where to Start in 2026
AI Coding Agents Compared: Cursor vs Copilot vs Claude Code vs Windsurf in 2026
AI Coding Agents and Security Risks: What You Need to Know
AI Pair Programming: The Productivity Guide for 2026
AI-Assisted Code Review: Tools and Workflows for 2026
AI-Native Documentation
Browse all AI-Assisted Engineering articles →