Planning Agents

A Planning Agent is an AI Agent that, before taking any action, creates a structured plan of the steps needed to complete a task. Instead of figuring out the next step one at a time, a planning agent thinks about the full picture upfront — then executes each step in sequence.

Planning agents are essential for complex, multi-step tasks where getting the sequence of actions right matters.

Why Planning Is Important

Consider this task: "Research the top 5 AI companies, summarise each one's latest product, and create a comparison table."

Without Planning (Reactive Agent)

Step 1: Searches "top AI companies" → gets a list
Step 2: Randomly picks one, searches for its product
Step 3: Realises it needs all 5 — searches again
Step 4: Gets confused about what's been done
Step 5: Misses some companies, duplicates others
Result: Incomplete, disorganised output

With Planning (Planning Agent)

Plan created upfront:
  1. Search for the top 5 AI companies
  2. For each company (loop):
     a. Search for their latest product
     b. Summarise the product in 2-3 sentences
  3. Create a comparison table from all summaries
  4. Present the final table

Execution: Each step runs in order → Complete, structured output

Two Main Planning Architectures

Architecture 1 — Plan-and-Execute

The agent creates the full plan first, then executes each step. The plan is generated before any tools are called.

Phase 1 — PLAN:
  LLM receives the task and generates a step-by-step plan
  Plan: ["Step 1: ...", "Step 2: ...", "Step 3: ..."]

Phase 2 — EXECUTE:
  Agent loops through each step in the plan
  For each step → calls appropriate tools → stores results

Phase 3 — SYNTHESIZE:
  Agent combines all results into a final answer

Architecture 2 — Dynamic Planning (ReAct + Planning)

The agent creates an initial plan but can update it dynamically based on what it discovers during execution. More flexible but harder to implement.

Initial Plan: ["Step 1", "Step 2", "Step 3"]

After Step 1: Agent discovers Step 2 needs to be split into two steps
Updated Plan: ["Step 1 ✓", "Step 2a", "Step 2b", "Step 3"]

Implementing a Plan-and-Execute Agent

# planning_agent.py

import os
import json
from dotenv import load_dotenv
import openai
from tools import TOOL_MAP, TOOL_DEFINITIONS

load_dotenv()
client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# ─── Step 1: Planner ──────────────────────────────────────────────
PLANNER_PROMPT = """You are a task planner. 
Break down the given task into clear, numbered steps.
Each step should be a specific, actionable instruction.
Format the plan as a JSON array of strings.

Example:
Task: Find the price of iPhone 15 in India and convert it to USD
Plan: [
  "Search for iPhone 15 price in India",
  "Get the current INR to USD exchange rate",
  "Calculate the USD equivalent of the INR price",
  "Present both prices clearly"
]

Only output the JSON array. Nothing else."""


def create_plan(task: str) -> list:
    """Use the LLM to create a step-by-step plan for the task."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": PLANNER_PROMPT},
            {"role": "user",   "content": f"Task: {task}"}
        ],
        temperature=0.1,
        max_tokens=500
    )

    plan_text = response.choices[0].message.content
    # Strip markdown code fences if present
    plan_text = plan_text.replace("```json", "").replace("```", "").strip()
    plan = json.loads(plan_text)
    return plan


# ─── Step 2: Executor ─────────────────────────────────────────────
EXECUTOR_PROMPT = """You are a task executor.
Execute the given step using the available tools if needed.
If a tool is required, call it. 
Return the result of executing this step clearly and concisely.
Available tools: web_search, calculate"""


def execute_step(step: str, context: str) -> str:
    """Execute a single step in the plan using tools if needed."""
    messages = [
        {"role": "system", "content": EXECUTOR_PROMPT},
        {"role": "user", "content": f"""
Context from previous steps:
{context}

Current step to execute: {step}

Execute this step now."""}
    ]

    for _ in range(3):  # Up to 3 tool calls per step
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=TOOL_DEFINITIONS,
            tool_choice="auto",
            temperature=0.2,
            max_tokens=600
        )

        message = response.choices[0].message
        messages.append(message)

        if message.tool_calls:
            for tool_call in message.tool_calls:
                tool_name = tool_call.function.name
                tool_args = json.loads(tool_call.function.arguments)

                if tool_name in TOOL_MAP:
                    result = TOOL_MAP[tool_name](**tool_args)
                else:
                    result = json.dumps({"error": "Unknown tool"})

                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result
                })
        else:
            return message.content  # Step complete

    return "Step could not be completed"


# ─── Step 3: Synthesizer ──────────────────────────────────────────
SYNTHESIZER_PROMPT = """You are a synthesis expert.
Given a task and the results from executing each step of a plan,
create a clear, well-structured final answer for the user.
Make the response concise and actionable."""


def synthesize_results(task: str, plan: list, results: list) -> str:
    """Combine all step results into a final coherent answer."""
    step_results = "\n".join([
        f"Step {i+1}: {plan[i]}\nResult: {results[i]}"
        for i in range(len(plan))
    ])

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYNTHESIZER_PROMPT},
            {"role": "user", "content": f"""
Original Task: {task}

Step-by-Step Results:
{step_results}

Please synthesise these results into a final, clear answer."""}
        ],
        temperature=0.3,
        max_tokens=800
    )

    return response.choices[0].message.content


# ─── Main Planning Agent ──────────────────────────────────────────
def run_planning_agent(task: str) -> str:
    print(f"\n{'='*55}")
    print(f"Task: {task}")
    print(f"{'='*55}")

    # Phase 1: Create plan
    print("\n📋 CREATING PLAN...")
    plan = create_plan(task)
    for i, step in enumerate(plan):
        print(f"  {i+1}. {step}")

    # Phase 2: Execute each step
    print("\n⚙️ EXECUTING STEPS...")
    results = []
    context = ""

    for i, step in enumerate(plan):
        print(f"\n  Executing Step {i+1}: {step}")
        result = execute_step(step, context)
        results.append(result)
        context += f"\nStep {i+1} ({step}): {result}"
        print(f"  Result: {result[:200]}...")

    # Phase 3: Synthesize
    print("\n✅ SYNTHESIZING FINAL ANSWER...")
    final_answer = synthesize_results(task, plan, results)

    print(f"\n{'='*55}")
    print("FINAL ANSWER:")
    print(final_answer)
    print(f"{'='*55}\n")

    return final_answer


# Test
if __name__ == "__main__":
    run_planning_agent(
        "What is LangChain and what are its three main use cases?"
    )

Sample Output

=======================================================
Task: What is LangChain and what are its three main use cases?
=======================================================

📋 CREATING PLAN...
  1. Search for what LangChain is
  2. Search for LangChain main use cases
  3. Summarise and combine the findings

⚙️ EXECUTING STEPS...

  Executing Step 1: Search for what LangChain is
  Result: LangChain is an open-source framework for building applications with LLMs...

  Executing Step 2: Search for LangChain main use cases
  Result: LangChain is commonly used for: RAG (retrieval-augmented generation),
          AI Agents, and automated pipelines...

  Executing Step 3: Summarise and combine the findings
  Result: Combined summary prepared...

✅ SYNTHESIZING FINAL ANSWER...

FINAL ANSWER:
LangChain is an open-source Python framework that simplifies building applications
powered by Large Language Models (LLMs).

Its three main use cases are:
1. RAG (Retrieval-Augmented Generation): Connecting LLMs to external documents
   and knowledge bases for accurate, grounded answers.
2. AI Agents: Building autonomous agents that can reason, plan, and use tools.
3. Automated Pipelines: Chaining multiple LLM calls, tools, and logic into
   automated workflows with minimal code.

When to Use a Planning Agent

Use Planning Agent When...Use Simple ReAct Agent When...
Task has 4+ distinct stepsTask can be done in 1–3 tool calls
Order of steps matters criticallyTask is open-ended exploration
Result quality depends on structureReal-time responsiveness is needed
Task needs parallel executionTask is conversational

Summary

Planning Agents break complex tasks into structured, executable steps before taking any action. The Plan-and-Execute architecture separates task planning from execution, making agents more organised and reliable. By using a dedicated Planner LLM call, an Executor for each step, and a Synthesizer for the final output, complex, multi-part tasks can be completed reliably. Planning Agents are the backbone of production-grade AI automation workflows.

Leave a Comment

Your email address will not be published. Required fields are marked *