# Multi-Agent Meeting Framework Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Build a lightweight, Unix-style multi-agent collaboration system with two CLI tools: `agent_builder.py` (generate Agent configs) and `run_meeting.py` (orchestrate meetings).

**Architecture:** Thin LLM Protocol abstraction over anthropic/openai SDKs. Stateless Agents communicate via a shared whiteboard (Pydantic models). PM Agent acts as a JSON-outputting state machine router. Hard `max_rounds` limit prevents infinite loops.

**Tech Stack:** Python 3.11+, pydantic>=2.0, anthropic SDK, openai SDK, pyyaml, pytest

---

## File Map

| File | Responsibility |
|------|---------------|
| `src/__init__.py` | Package marker |
| `src/models.py` | All Pydantic models: AgentConfig, ScratchpadEntry, MeetingState, PMDecision |
| `src/llm_client.py` | LLMClient Protocol, AnthropicClient, OpenAIClient, create_client factory |
| `src/config_loader.py` | Load Agent configs from JSON/YAML files |
| `src/agent.py` | Stateless Agent executor (run_agent) |
| `src/pm.py` | PM Agent router (run_pm, PM_SYSTEM_PROMPT, build_pm_system_prompt) |
| `src/meeting.py` | Meeting event loop (run_meeting), terminal output, report saving |
| `agent_builder.py` | CLI 1: Generate Agent config from natural language |
| `run_meeting.py` | CLI 2: Run a meeting |
| `tests/test_models.py` | Model validation tests |
| `tests/test_llm_client.py` | LLM client factory tests |
| `tests/test_config_loader.py` | Config loading tests |
| `tests/test_agent.py` | Agent executor tests |
| `tests/test_pm.py` | PM router tests |
| `tests/test_meeting.py` | Meeting loop integration tests |
| `requirements.txt` | Dependencies |
| `README.md` | Usage instructions and demo |

---

### Task 1: Project Scaffolding and Dependencies

**Files:**
- Create: `src/__init__.py`
- Create: `tests/__init__.py`
- Create: `requirements.txt`
- Create: `agents/.gitkeep`
- Create: `reports/.gitkeep`

- [ ] **Step 1: Create directory structure**

```bash
mkdir -p src tests agents reports
```

- [ ] **Step 2: Create package markers**

`src/__init__.py`:
```python
"""Multi-Agent Meeting Framework."""
```

`tests/__init__.py`:
```python
"""Tests for multi-agent meeting framework."""
```

- [ ] **Step 3: Create requirements.txt**

```
pydantic>=2.0
anthropic>=0.40.0
openai>=1.0
pyyaml>=6.0
pytest>=8.0
```

- [ ] **Step 4: Create .gitkeep files**

```bash
touch agents/.gitkeep reports/.gitkeep
```

- [ ] **Step 5: Install dependencies**

```bash
python -m pip install -r requirements.txt
```

- [ ] **Step 6: Verify pytest works**

Run: `python -m pytest --co -q`
Expected: `no tests ran` (no errors)

- [ ] **Step 7: Commit**

```bash
git add src/ tests/ agents/.gitkeep reports/.gitkeep requirements.txt
git commit -m "chore: scaffold project structure and dependencies"
```

---

### Task 2: Pydantic Data Models

**Files:**
- Create: `src/models.py`
- Create: `tests/test_models.py`

- [ ] **Step 1: Write failing tests for AgentConfig**

`tests/test_models.py`:
```python
"""Tests for Pydantic data models."""

from src.models import AgentConfig


class TestAgentConfig:
    """Tests for AgentConfig model."""

    def test_create_minimal_agent_config(self) -> None:
        """AgentConfig requires name, role, system_prompt."""
        config = AgentConfig(
            name="test_agent",
            role="Tester",
            system_prompt="You are a tester.",
        )
        assert config.name == "test_agent"
        assert config.role == "Tester"
        assert config.system_prompt == "You are a tester."
        assert config.input_schema == ""
        assert config.output_schema == ""

    def test_create_full_agent_config(self) -> None:
        """AgentConfig with all fields populated."""
        config = AgentConfig(
            name="full_agent",
            role="Full Role",
            system_prompt="Full prompt.",
            input_schema="expects a question string",
            output_schema="returns analysis string",
        )
        assert config.input_schema == "expects a question string"
        assert config.output_schema == "returns analysis string"

    def test_agent_config_serialization(self) -> None:
        """AgentConfig round-trips through JSON."""
        config = AgentConfig(
            name="test",
            role="Role",
            system_prompt="Prompt.",
        )
        data = config.model_dump()
        restored = AgentConfig(**data)
        assert restored == config
```

- [ ] **Step 2: Run tests to verify they fail**

Run: `python -m pytest tests/test_models.py -v`
Expected: FAIL with `ModuleNotFoundError: No module named 'src.models'`

- [ ] **Step 3: Implement AgentConfig**

`src/models.py`:
```python
"""Pydantic data models for the multi-agent meeting framework."""

from __future__ import annotations

from datetime import datetime
from typing import Literal

from pydantic import BaseModel, model_validator


class AgentConfig(BaseModel):
    """Configuration for a single Agent."""

    name: str
    role: str
    system_prompt: str
    input_schema: str = ""
    output_schema: str = ""
```

- [ ] **Step 4: Run tests to verify they pass**

Run: `python -m pytest tests/test_models.py::TestAgentConfig -v`
Expected: 3 passed

- [ ] **Step 5: Write failing tests for ScratchpadEntry and MeetingState**

Append to `tests/test_models.py`:
```python
from datetime import datetime

from src.models import AgentConfig, MeetingState, ScratchpadEntry


class TestScratchpadEntry:
    """Tests for ScratchpadEntry model."""

    def test_create_scratchpad_entry(self) -> None:
        """ScratchpadEntry stores agent name, content, timestamp."""
        now = datetime(2026, 3, 30, 10, 0, 0)
        entry = ScratchpadEntry(
            agent_name="architect",
            content="We should use microservices.",
            timestamp=now,
        )
        assert entry.agent_name == "architect"
        assert entry.content == "We should use microservices."
        assert entry.timestamp == now


class TestMeetingState:
    """Tests for MeetingState model."""

    def test_create_default_meeting_state(self) -> None:
        """MeetingState with defaults has empty scratchpad and round 0."""
        state = MeetingState(topic="Test topic")
        assert state.topic == "Test topic"
        assert state.scratchpad == []
        assert state.current_round == 0
        assert state.max_rounds == 5

    def test_meeting_state_with_entries(self) -> None:
        """MeetingState can hold scratchpad entries."""
        entry = ScratchpadEntry(
            agent_name="test",
            content="Hello",
            timestamp=datetime(2026, 3, 30, 10, 0, 0),
        )
        state = MeetingState(topic="Topic", scratchpad=[entry])
        assert len(state.scratchpad) == 1
        assert state.scratchpad[0].agent_name == "test"
```

- [ ] **Step 6: Run tests to verify they pass** (models already defined in step 3 skeleton, but we need ScratchpadEntry and MeetingState)

Run: `python -m pytest tests/test_models.py -v`
Expected: FAIL — `ScratchpadEntry` and `MeetingState` not yet importable

- [ ] **Step 7: Implement ScratchpadEntry and MeetingState**

Add to `src/models.py` after `AgentConfig`:
```python
class ScratchpadEntry(BaseModel):
    """A single entry on the shared whiteboard."""

    agent_name: str
    content: str
    timestamp: datetime


class MeetingState(BaseModel):
    """The shared whiteboard maintaining all meeting context."""

    topic: str
    scratchpad: list[ScratchpadEntry] = []
    current_round: int = 0
    max_rounds: int = 5
```

- [ ] **Step 8: Run tests to verify they pass**

Run: `python -m pytest tests/test_models.py -v`
Expected: 5 passed

- [ ] **Step 9: Write failing tests for PMDecision**

Append to `tests/test_models.py`:
```python
import pytest
from pydantic import ValidationError

from src.models import PMDecision


class TestPMDecision:
    """Tests for PMDecision model with conditional validation."""

    def test_call_agent_decision(self) -> None:
        """CALL_AGENT requires target_agent and prompt_for_agent."""
        decision = PMDecision(
            analysis="Need architect input.",
            next_action="CALL_AGENT",
            target_agent="architect",
            prompt_for_agent="What about scalability?",
        )
        assert decision.next_action == "CALL_AGENT"
        assert decision.target_agent == "architect"

    def test_finish_decision(self) -> None:
        """FINISH requires final_report."""
        decision = PMDecision(
            analysis="All topics covered.",
            next_action="FINISH",
            final_report="# Meeting Report\n\nDone.",
        )
        assert decision.next_action == "FINISH"
        assert decision.final_report is not None

    def test_call_agent_missing_target_raises(self) -> None:
        """CALL_AGENT without target_agent raises ValidationError."""
        with pytest.raises(ValidationError, match="target_agent"):
            PMDecision(
                analysis="Need input.",
                next_action="CALL_AGENT",
                prompt_for_agent="Question?",
            )

    def test_call_agent_missing_prompt_raises(self) -> None:
        """CALL_AGENT without prompt_for_agent raises ValidationError."""
        with pytest.raises(ValidationError, match="prompt_for_agent"):
            PMDecision(
                analysis="Need input.",
                next_action="CALL_AGENT",
                target_agent="architect",
            )

    def test_finish_missing_report_raises(self) -> None:
        """FINISH without final_report raises ValidationError."""
        with pytest.raises(ValidationError, match="final_report"):
            PMDecision(
                analysis="Done.",
                next_action="FINISH",
            )

    def test_invalid_next_action_raises(self) -> None:
        """Invalid next_action value raises ValidationError."""
        with pytest.raises(ValidationError):
            PMDecision(
                analysis="Hmm.",
                next_action="INVALID",
            )
```

- [ ] **Step 10: Run tests to verify they fail**

Run: `python -m pytest tests/test_models.py::TestPMDecision -v`
Expected: FAIL — `PMDecision` not importable

- [ ] **Step 11: Implement PMDecision with conditional validation**

Add to `src/models.py`:
```python
class PMDecision(BaseModel):
    """Structured output from the PM Agent each round."""

    analysis: str
    next_action: Literal["CALL_AGENT", "FINISH"]
    target_agent: str | None = None
    prompt_for_agent: str | None = None
    final_report: str | None = None

    @model_validator(mode="after")
    def validate_fields_for_action(self) -> PMDecision:
        """Ensure required fields are present based on next_action."""
        if self.next_action == "CALL_AGENT":
            if not self.target_agent:
                raise ValueError("target_agent is required when next_action is CALL_AGENT")
            if not self.prompt_for_agent:
                raise ValueError("prompt_for_agent is required when next_action is CALL_AGENT")
        elif self.next_action == "FINISH":
            if not self.final_report:
                raise ValueError("final_report is required when next_action is FINISH")
        return self
```

- [ ] **Step 12: Run all model tests**

Run: `python -m pytest tests/test_models.py -v`
Expected: 11 passed

- [ ] **Step 13: Commit**

```bash
git add src/models.py tests/test_models.py
git commit -m "feat: add Pydantic data models (AgentConfig, MeetingState, PMDecision)"
```

---

### Task 3: LLM Client Abstraction

**Files:**
- Create: `src/llm_client.py`
- Create: `tests/test_llm_client.py`

- [ ] **Step 1: Write failing tests for LLM client factory**

`tests/test_llm_client.py`:
```python
"""Tests for LLM client abstraction layer."""

from __future__ import annotations

import os
from unittest.mock import MagicMock, patch

import pytest

from src.llm_client import AnthropicClient, OpenAIClient, create_client


class TestCreateClient:
    """Tests for the create_client factory function."""

    @patch.dict(os.environ, {"LLM_PROVIDER": "anthropic", "ANTHROPIC_API_KEY": "test-key"})
    def test_create_anthropic_client(self) -> None:
        """Default provider creates AnthropicClient."""
        client = create_client()
        assert isinstance(client, AnthropicClient)

    @patch.dict(os.environ, {"LLM_PROVIDER": "openai", "OPENAI_API_KEY": "test-key"})
    def test_create_openai_client(self) -> None:
        """Provider 'openai' creates OpenAIClient."""
        client = create_client()
        assert isinstance(client, OpenAIClient)

    @patch.dict(os.environ, {"LLM_PROVIDER": "invalid"}, clear=False)
    def test_invalid_provider_raises(self) -> None:
        """Unknown provider raises ValueError."""
        with pytest.raises(ValueError, match="Unknown LLM provider"):
            create_client()

    @patch.dict(os.environ, {}, clear=True)
    def test_default_provider_is_anthropic(self) -> None:
        """No LLM_PROVIDER env var defaults to anthropic."""
        # Will fail due to missing API key, but should attempt Anthropic
        with patch("src.llm_client.anthropic") as mock_anthropic:
            mock_anthropic.Anthropic.return_value = MagicMock()
            client = create_client()
            assert isinstance(client, AnthropicClient)


class TestAnthropicClient:
    """Tests for AnthropicClient.chat method."""

    def test_chat_calls_api_correctly(self) -> None:
        """AnthropicClient.chat sends system and messages to API."""
        mock_sdk = MagicMock()
        mock_response = MagicMock()
        mock_response.content = [MagicMock(text="Hello from Claude")]
        mock_sdk.messages.create.return_value = mock_response

        client = AnthropicClient(client=mock_sdk, model="claude-sonnet-4-20250514")
        result = client.chat(
            system="You are helpful.",
            messages=[{"role": "user", "content": "Hi"}],
        )

        assert result == "Hello from Claude"
        mock_sdk.messages.create.assert_called_once_with(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            system="You are helpful.",
            messages=[{"role": "user", "content": "Hi"}],
        )


class TestOpenAIClient:
    """Tests for OpenAIClient.chat method."""

    def test_chat_calls_api_correctly(self) -> None:
        """OpenAIClient.chat sends system and messages to API."""
        mock_sdk = MagicMock()
        mock_choice = MagicMock()
        mock_choice.message.content = "Hello from GPT"
        mock_sdk.chat.completions.create.return_value = MagicMock(choices=[mock_choice])

        client = OpenAIClient(client=mock_sdk, model="gpt-4o")
        result = client.chat(
            system="You are helpful.",
            messages=[{"role": "user", "content": "Hi"}],
        )

        assert result == "Hello from GPT"
        mock_sdk.chat.completions.create.assert_called_once_with(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "You are helpful."},
                {"role": "user", "content": "Hi"},
            ],
        )
```

- [ ] **Step 2: Run tests to verify they fail**

Run: `python -m pytest tests/test_llm_client.py -v`
Expected: FAIL with `ModuleNotFoundError`

- [ ] **Step 3: Implement LLM client layer**

`src/llm_client.py`:
```python
"""LLM client abstraction layer with Protocol pattern."""

from __future__ import annotations

import os
from typing import Protocol

import anthropic
import openai


class LLMClient(Protocol):
    """Protocol defining the LLM client interface."""

    def chat(self, system: str, messages: list[dict[str, str]]) -> str:
        """Send a chat request and return the assistant's text response."""
        ...


class AnthropicClient:
    """LLM client wrapping the Anthropic SDK."""

    def __init__(self, client: anthropic.Anthropic, model: str) -> None:
        self._client = client
        self._model = model

    @property
    def client(self) -> anthropic.Anthropic:
        """Expose underlying SDK client for testing."""
        return self._client

    @property
    def model(self) -> str:
        """Expose model name for testing."""
        return self._model

    def chat(self, system: str, messages: list[dict[str, str]]) -> str:
        """Send a chat request via Anthropic API."""
        response = self._client.messages.create(
            model=self._model,
            max_tokens=4096,
            system=system,
            messages=messages,
        )
        return response.content[0].text


class OpenAIClient:
    """LLM client wrapping the OpenAI SDK."""

    def __init__(self, client: openai.OpenAI, model: str) -> None:
        self._client = client
        self._model = model

    @property
    def client(self) -> openai.OpenAI:
        """Expose underlying SDK client for testing."""
        return self._client

    @property
    def model(self) -> str:
        """Expose model name for testing."""
        return self._model

    def chat(self, system: str, messages: list[dict[str, str]]) -> str:
        """Send a chat request via OpenAI API."""
        full_messages = [{"role": "system", "content": system}, *messages]
        response = self._client.chat.completions.create(
            model=self._model,
            messages=full_messages,
        )
        return response.choices[0].message.content


_DEFAULT_MODELS: dict[str, str] = {
    "anthropic": "claude-sonnet-4-20250514",
    "openai": "gpt-4o",
}


def create_client() -> AnthropicClient | OpenAIClient:
    """Create an LLM client based on environment variables.

    Env vars:
        LLM_PROVIDER: "anthropic" (default) or "openai"
        LLM_MODEL: model name (optional, has sensible defaults)
    """
    provider = os.environ.get("LLM_PROVIDER", "anthropic")
    model = os.environ.get("LLM_MODEL", _DEFAULT_MODELS.get(provider, ""))

    if provider == "anthropic":
        sdk_client = anthropic.Anthropic()
        return AnthropicClient(client=sdk_client, model=model)
    elif provider == "openai":
        sdk_client = openai.OpenAI()
        return OpenAIClient(client=sdk_client, model=model)
    else:
        raise ValueError(f"Unknown LLM provider: {provider}. Use 'anthropic' or 'openai'.")
```

- [ ] **Step 4: Run tests to verify they pass**

Run: `python -m pytest tests/test_llm_client.py -v`
Expected: 5 passed

- [ ] **Step 5: Commit**

```bash
git add src/llm_client.py tests/test_llm_client.py
git commit -m "feat: add LLM client abstraction (Protocol + Anthropic/OpenAI)"
```

---

### Task 4: Config Loader

**Files:**
- Create: `src/config_loader.py`
- Create: `tests/test_config_loader.py`

- [ ] **Step 1: Write failing tests**

`tests/test_config_loader.py`:
```python
"""Tests for Agent config loader."""

from __future__ import annotations

import json
from pathlib import Path

import pytest
import yaml

from src.config_loader import load_agent_config, load_agents
from src.models import AgentConfig


@pytest.fixture()
def agents_dir(tmp_path: Path) -> Path:
    """Create a temporary agents directory with test configs."""
    agent_data = {
        "name": "test_architect",
        "role": "Software Architect",
        "system_prompt": "You are an architect.",
    }

    # Write JSON config
    json_path = tmp_path / "test_architect.json"
    json_path.write_text(json.dumps(agent_data), encoding="utf-8")

    # Write YAML config
    yaml_data = {
        "name": "test_devops",
        "role": "DevOps Engineer",
        "system_prompt": "You are a DevOps engineer.",
    }
    yaml_path = tmp_path / "test_devops.yaml"
    yaml_path.write_text(yaml.dump(yaml_data), encoding="utf-8")

    return tmp_path


class TestLoadAgentConfig:
    """Tests for load_agent_config function."""

    def test_load_json_config(self, agents_dir: Path) -> None:
        """Load Agent config from JSON file."""
        config = load_agent_config(agents_dir / "test_architect.json")
        assert config.name == "test_architect"
        assert config.role == "Software Architect"

    def test_load_yaml_config(self, agents_dir: Path) -> None:
        """Load Agent config from YAML file."""
        config = load_agent_config(agents_dir / "test_devops.yaml")
        assert config.name == "test_devops"
        assert config.role == "DevOps Engineer"

    def test_load_nonexistent_file_raises(self, tmp_path: Path) -> None:
        """Loading a nonexistent file raises FileNotFoundError."""
        with pytest.raises(FileNotFoundError):
            load_agent_config(tmp_path / "nonexistent.json")

    def test_load_unsupported_extension_raises(self, tmp_path: Path) -> None:
        """Loading a file with unsupported extension raises ValueError."""
        bad_file = tmp_path / "agent.txt"
        bad_file.write_text("{}", encoding="utf-8")
        with pytest.raises(ValueError, match="Unsupported file format"):
            load_agent_config(bad_file)


class TestLoadAgents:
    """Tests for load_agents function."""

    def test_load_multiple_agents(self, agents_dir: Path) -> None:
        """Load multiple agents by name."""
        agents = load_agents(["test_architect", "test_devops"], agents_dir=agents_dir)
        assert "test_architect" in agents
        assert "test_devops" in agents
        assert isinstance(agents["test_architect"], AgentConfig)

    def test_load_missing_agent_raises(self, agents_dir: Path) -> None:
        """Loading a non-existent agent name raises FileNotFoundError."""
        with pytest.raises(FileNotFoundError, match="not_here"):
            load_agents(["not_here"], agents_dir=agents_dir)
```

- [ ] **Step 2: Run tests to verify they fail**

Run: `python -m pytest tests/test_config_loader.py -v`
Expected: FAIL with `ModuleNotFoundError`

- [ ] **Step 3: Implement config_loader**

`src/config_loader.py`:
```python
"""Load Agent configuration files from JSON or YAML."""

from __future__ import annotations

import json
from pathlib import Path

import yaml

from src.models import AgentConfig

_SUPPORTED_EXTENSIONS = (".json", ".yaml", ".yml")


def load_agent_config(path: Path) -> AgentConfig:
    """Load a single Agent config from a JSON or YAML file.

    Args:
        path: Path to the config file.

    Returns:
        Parsed AgentConfig.

    Raises:
        FileNotFoundError: If the file does not exist.
        ValueError: If the file extension is not supported.
    """
    if not path.exists():
        raise FileNotFoundError(f"Agent config not found: {path}")

    suffix = path.suffix.lower()
    if suffix not in _SUPPORTED_EXTENSIONS:
        raise ValueError(
            f"Unsupported file format: '{suffix}'. "
            f"Use one of: {', '.join(_SUPPORTED_EXTENSIONS)}"
        )

    raw_text = path.read_text(encoding="utf-8")

    if suffix == ".json":
        data = json.loads(raw_text)
    else:
        data = yaml.safe_load(raw_text)

    return AgentConfig(**data)


def load_agents(
    names: list[str],
    agents_dir: Path = Path("agents"),
) -> dict[str, AgentConfig]:
    """Load multiple Agent configs by name.

    Searches agents_dir for files matching each name with supported extensions.

    Args:
        names: List of agent names to load.
        agents_dir: Directory containing agent config files.

    Returns:
        Dict mapping agent name to AgentConfig.

    Raises:
        FileNotFoundError: If an agent config cannot be found.
    """
    agents: dict[str, AgentConfig] = {}

    for name in names:
        found = False
        for ext in _SUPPORTED_EXTENSIONS:
            candidate = agents_dir / f"{name}{ext}"
            if candidate.exists():
                agents[name] = load_agent_config(candidate)
                found = True
                break

        if not found:
            available = [
                p.stem
                for p in agents_dir.iterdir()
                if p.suffix.lower() in _SUPPORTED_EXTENSIONS
            ]
            raise FileNotFoundError(
                f"Agent config not found for '{name}'. "
                f"Available agents: {', '.join(sorted(available)) or '(none)'}"
            )

    return agents
```

- [ ] **Step 4: Run tests to verify they pass**

Run: `python -m pytest tests/test_config_loader.py -v`
Expected: 6 passed

- [ ] **Step 5: Commit**

```bash
git add src/config_loader.py tests/test_config_loader.py
git commit -m "feat: add config loader for JSON/YAML agent files"
```

---

### Task 5: Stateless Agent Executor

**Files:**
- Create: `src/agent.py`
- Create: `tests/test_agent.py`

- [ ] **Step 1: Write failing tests**

`tests/test_agent.py`:
```python
"""Tests for the stateless Agent executor."""

from __future__ import annotations

from src.agent import format_scratchpad_summary, run_agent
from src.models import AgentConfig


class MockLLMClient:
    """Mock LLM client that returns a canned response."""

    def __init__(self, response: str) -> None:
        self.response = response
        self.last_system: str = ""
        self.last_messages: list[dict[str, str]] = []

    def chat(self, system: str, messages: list[dict[str, str]]) -> str:
        """Record call and return canned response."""
        self.last_system = system
        self.last_messages = messages
        return self.response


class TestRunAgent:
    """Tests for run_agent function."""

    def test_run_agent_returns_llm_response(self) -> None:
        """run_agent passes config and prompt to LLM, returns response."""
        config = AgentConfig(
            name="architect",
            role="Software Architect",
            system_prompt="You are an architect.",
        )
        mock_client = MockLLMClient(response="Use microservices.")

        result = run_agent(
            config=config,
            prompt="What architecture should we use?",
            scratchpad_summary="Topic: database migration",
            client=mock_client,
        )

        assert result == "Use microservices."

    def test_run_agent_uses_config_system_prompt(self) -> None:
        """run_agent sends the Agent's system_prompt as the system message."""
        config = AgentConfig(
            name="devops",
            role="DevOps Engineer",
            system_prompt="You focus on deployment.",
        )
        mock_client = MockLLMClient(response="OK")

        run_agent(
            config=config,
            prompt="How to deploy?",
            scratchpad_summary="Topic: deployment",
            client=mock_client,
        )

        assert "You focus on deployment." in mock_client.last_system

    def test_run_agent_includes_scratchpad_in_message(self) -> None:
        """run_agent includes scratchpad summary in the user message."""
        config = AgentConfig(
            name="test",
            role="Tester",
            system_prompt="You test things.",
        )
        mock_client = MockLLMClient(response="Noted.")

        run_agent(
            config=config,
            prompt="Review this?",
            scratchpad_summary="Previous: architect said use microservices.",
            client=mock_client,
        )

        user_message = mock_client.last_messages[0]["content"]
        assert "Previous: architect said use microservices." in user_message
        assert "Review this?" in user_message


class TestFormatScratchpadSummary:
    """Tests for format_scratchpad_summary helper."""

    def test_empty_scratchpad(self) -> None:
        """Empty scratchpad returns 'No discussion yet.'"""
        result = format_scratchpad_summary([])
        assert result == "No discussion yet."

    def test_scratchpad_with_entries(self) -> None:
        """Scratchpad entries are formatted as agent: content."""
        from datetime import datetime

        from src.models import ScratchpadEntry

        entries = [
            ScratchpadEntry(
                agent_name="architect",
                content="Use microservices.",
                timestamp=datetime(2026, 3, 30, 10, 0, 0),
            ),
            ScratchpadEntry(
                agent_name="devops",
                content="That increases complexity.",
                timestamp=datetime(2026, 3, 30, 10, 1, 0),
            ),
        ]
        result = format_scratchpad_summary(entries)
        assert "[architect]" in result
        assert "Use microservices." in result
        assert "[devops]" in result
        assert "That increases complexity." in result
```

- [ ] **Step 2: Run tests to verify they fail**

Run: `python -m pytest tests/test_agent.py -v`
Expected: FAIL with `ModuleNotFoundError`

- [ ] **Step 3: Implement agent.py**

`src/agent.py`:
```python
"""Stateless Agent executor."""

from __future__ import annotations

from typing import TYPE_CHECKING

from src.models import AgentConfig, ScratchpadEntry

if TYPE_CHECKING:
    from src.llm_client import LLMClient


def format_scratchpad_summary(entries: list[ScratchpadEntry]) -> str:
    """Format scratchpad entries into a readable summary.

    Args:
        entries: List of scratchpad entries.

    Returns:
        Formatted string summary of the whiteboard.
    """
    if not entries:
        return "No discussion yet."

    lines: list[str] = []
    for entry in entries:
        lines.append(f"[{entry.agent_name}]: {entry.content}")
    return "\n".join(lines)


def run_agent(
    config: AgentConfig,
    prompt: str,
    scratchpad_summary: str,
    client: LLMClient,
) -> str:
    """Execute a single Agent turn.

    The Agent sees its own system_prompt, a summary of the current
    whiteboard, and the specific prompt from PM.

    Args:
        config: The Agent's configuration.
        prompt: The specific question or instruction from PM.
        scratchpad_summary: Summary of the current whiteboard state.
        client: LLM client to use for inference.

    Returns:
        The Agent's text response.
    """
    user_content = (
        f"## Current Meeting Whiteboard\n\n{scratchpad_summary}\n\n"
        f"## Your Task\n\n{prompt}"
    )

    return client.chat(
        system=config.system_prompt,
        messages=[{"role": "user", "content": user_content}],
    )
```

- [ ] **Step 4: Run tests to verify they pass**

Run: `python -m pytest tests/test_agent.py -v`
Expected: 5 passed

- [ ] **Step 5: Commit**

```bash
git add src/agent.py tests/test_agent.py
git commit -m "feat: add stateless agent executor"
```

---

### Task 6: PM Agent Router

**Files:**
- Create: `src/pm.py`
- Create: `tests/test_pm.py`

- [ ] **Step 1: Write failing tests**

`tests/test_pm.py`:
```python
"""Tests for the PM Agent router."""

from __future__ import annotations

import json

import pytest

from src.models import AgentConfig, MeetingState, PMDecision
from src.pm import build_pm_system_prompt, run_pm


class MockLLMClient:
    """Mock LLM client returning a canned JSON response."""

    def __init__(self, response: str) -> None:
        self.response = response
        self.call_count = 0
        self.last_system: str = ""

    def chat(self, system: str, messages: list[dict[str, str]]) -> str:
        """Record call and return canned response."""
        self.call_count += 1
        self.last_system = system
        return self.response


SAMPLE_AGENTS: dict[str, AgentConfig] = {
    "architect": AgentConfig(
        name="architect",
        role="Software Architect",
        system_prompt="You are an architect.",
    ),
    "devops": AgentConfig(
        name="devops",
        role="DevOps Engineer",
        system_prompt="You are DevOps.",
    ),
}


class TestBuildPMSystemPrompt:
    """Tests for build_pm_system_prompt."""

    def test_includes_agent_names_and_roles(self) -> None:
        """PM system prompt lists available agents with roles."""
        prompt = build_pm_system_prompt(SAMPLE_AGENTS)
        assert "architect" in prompt
        assert "Software Architect" in prompt
        assert "devops" in prompt
        assert "DevOps Engineer" in prompt

    def test_includes_json_schema_instruction(self) -> None:
        """PM system prompt instructs JSON output."""
        prompt = build_pm_system_prompt(SAMPLE_AGENTS)
        assert "JSON" in prompt
        assert "CALL_AGENT" in prompt
        assert "FINISH" in prompt


class TestRunPM:
    """Tests for run_pm function."""

    def test_run_pm_returns_call_agent_decision(self) -> None:
        """PM returns a valid CALL_AGENT decision."""
        decision_json = json.dumps({
            "analysis": "Need architect input.",
            "next_action": "CALL_AGENT",
            "target_agent": "architect",
            "prompt_for_agent": "What about scalability?",
        })
        mock_client = MockLLMClient(response=decision_json)
        state = MeetingState(topic="Database migration")

        decision = run_pm(state, SAMPLE_AGENTS, mock_client)

        assert isinstance(decision, PMDecision)
        assert decision.next_action == "CALL_AGENT"
        assert decision.target_agent == "architect"

    def test_run_pm_returns_finish_decision(self) -> None:
        """PM returns a valid FINISH decision."""
        decision_json = json.dumps({
            "analysis": "All covered.",
            "next_action": "FINISH",
            "final_report": "# Report\n\nDone.",
        })
        mock_client = MockLLMClient(response=decision_json)
        state = MeetingState(topic="Test")

        decision = run_pm(state, SAMPLE_AGENTS, mock_client)

        assert decision.next_action == "FINISH"
        assert "Report" in decision.final_report

    def test_run_pm_retries_on_invalid_json(self) -> None:
        """PM retries when LLM returns invalid JSON, up to max retries."""
        call_count = 0

        class RetryMockClient:
            def chat(self, system: str, messages: list[dict[str, str]]) -> str:
                nonlocal call_count
                call_count += 1
                if call_count < 3:
                    return "This is not JSON"
                return json.dumps({
                    "analysis": "OK",
                    "next_action": "FINISH",
                    "final_report": "# Done",
                })

        state = MeetingState(topic="Test")
        decision = run_pm(state, SAMPLE_AGENTS, RetryMockClient())

        assert decision.next_action == "FINISH"
        assert call_count == 3

    def test_run_pm_raises_after_max_retries(self) -> None:
        """PM raises RuntimeError after exhausting retries."""
        mock_client = MockLLMClient(response="not json at all")
        state = MeetingState(topic="Test")

        with pytest.raises(RuntimeError, match="Failed to get valid PM decision"):
            run_pm(state, SAMPLE_AGENTS, mock_client)

    def test_run_pm_validates_target_agent_exists(self) -> None:
        """PM retries if target_agent is not in available agents."""
        call_count = 0

        class AgentValidationMockClient:
            def chat(self, system: str, messages: list[dict[str, str]]) -> str:
                nonlocal call_count
                call_count += 1
                if call_count == 1:
                    return json.dumps({
                        "analysis": "Need input.",
                        "next_action": "CALL_AGENT",
                        "target_agent": "nonexistent_agent",
                        "prompt_for_agent": "Hello?",
                    })
                return json.dumps({
                    "analysis": "OK.",
                    "next_action": "CALL_AGENT",
                    "target_agent": "architect",
                    "prompt_for_agent": "What do you think?",
                })

        state = MeetingState(topic="Test")
        decision = run_pm(state, SAMPLE_AGENTS, AgentValidationMockClient())

        assert decision.target_agent == "architect"
        assert call_count == 2
```

- [ ] **Step 2: Run tests to verify they fail**

Run: `python -m pytest tests/test_pm.py -v`
Expected: FAIL with `ModuleNotFoundError`

- [ ] **Step 3: Implement pm.py**

`src/pm.py`:
```python
"""PM Agent router - routes discussion between participant Agents."""

from __future__ import annotations

import json
from typing import TYPE_CHECKING

from pydantic import ValidationError

from src.agent import format_scratchpad_summary
from src.models import AgentConfig, MeetingState, PMDecision

if TYPE_CHECKING:
    from src.llm_client import LLMClient

_MAX_RETRIES = 3


def build_pm_system_prompt(available_agents: dict[str, AgentConfig]) -> str:
    """Build the PM Agent's system prompt with available agents listed.

    Args:
        available_agents: Dict of agent name to AgentConfig.

    Returns:
        Complete system prompt for the PM Agent.
    """
    agent_list = "\n".join(
        f"- {name}: {config.role}" for name, config in available_agents.items()
    )

    return f"""You are a meeting facilitator (PM). Your job is to:
1. Analyze the current whiteboard state
2. Decide which Agent should speak next, or if the meeting should end
3. Output ONLY valid JSON matching the schema below

Available Agents:
{agent_list}

You MUST output ONLY a JSON object with these fields:
- "analysis": brief analysis of current whiteboard state
- "next_action": either "CALL_AGENT" or "FINISH"
- "target_agent": name of the agent to call (required if CALL_AGENT)
- "prompt_for_agent": specific question for that agent (required if CALL_AGENT)
- "final_report": complete Markdown meeting report (required if FINISH)

Rules:
- Call each agent at least once before finishing, if possible.
- When all perspectives are covered, output FINISH with a comprehensive report.
- The target_agent MUST be one of the available agents listed above.
- Output ONLY the JSON object, no other text."""


def run_pm(
    state: MeetingState,
    available_agents: dict[str, AgentConfig],
    client: LLMClient,
) -> PMDecision:
    """Run PM Agent to get the next routing decision.

    Sends whiteboard state to the LLM, parses response as PMDecision.
    Retries up to _MAX_RETRIES times on JSON parse or validation failure.

    Args:
        state: Current meeting state with whiteboard.
        available_agents: Dict of available agent configs.
        client: LLM client for inference.

    Returns:
        Validated PMDecision.

    Raises:
        RuntimeError: If all retries are exhausted.
    """
    system_prompt = build_pm_system_prompt(available_agents)
    scratchpad_text = format_scratchpad_summary(state.scratchpad)

    user_content = (
        f"## Meeting Topic\n\n{state.topic}\n\n"
        f"## Current Whiteboard (Round {state.current_round})\n\n{scratchpad_text}"
    )

    messages: list[dict[str, str]] = [{"role": "user", "content": user_content}]
    last_error = ""

    for attempt in range(_MAX_RETRIES):
        if last_error:
            messages.append({
                "role": "user",
                "content": f"Your previous response was invalid: {last_error}. Please output ONLY valid JSON.",
            })

        raw_response = client.chat(system=system_prompt, messages=messages)

        try:
            # Strip potential markdown code fences
            cleaned = raw_response.strip()
            if cleaned.startswith("```"):
                cleaned = cleaned.split("\n", 1)[1] if "\n" in cleaned else cleaned
                if cleaned.endswith("```"):
                    cleaned = cleaned[:-3]
                cleaned = cleaned.strip()

            data = json.loads(cleaned)
            decision = PMDecision(**data)

            # Validate target_agent exists
            if (
                decision.next_action == "CALL_AGENT"
                and decision.target_agent not in available_agents
            ):
                last_error = (
                    f"target_agent '{decision.target_agent}' is not available. "
                    f"Choose from: {', '.join(available_agents.keys())}"
                )
                continue

            return decision

        except (json.JSONDecodeError, ValidationError) as exc:
            last_error = str(exc)
            continue

    raise RuntimeError(
        f"Failed to get valid PM decision after {_MAX_RETRIES} attempts. "
        f"Last error: {last_error}"
    )
```

- [ ] **Step 4: Run tests to verify they pass**

Run: `python -m pytest tests/test_pm.py -v`
Expected: 7 passed

- [ ] **Step 5: Commit**

```bash
git add src/pm.py tests/test_pm.py
git commit -m "feat: add PM agent router with retry and validation"
```

---

### Task 7: Meeting Event Loop

**Files:**
- Create: `src/meeting.py`
- Create: `tests/test_meeting.py`

- [ ] **Step 1: Write failing tests**

`tests/test_meeting.py`:
```python
"""Tests for the meeting event loop."""

from __future__ import annotations

import json
from datetime import datetime
from pathlib import Path

from src.meeting import run_meeting, save_report
from src.models import AgentConfig

SAMPLE_AGENTS: dict[str, AgentConfig] = {
    "architect": AgentConfig(
        name="architect",
        role="Software Architect",
        system_prompt="You are an architect.",
    ),
    "devops": AgentConfig(
        name="devops",
        role="DevOps Engineer",
        system_prompt="You are DevOps.",
    ),
}


class SequentialMockClient:
    """Mock client that returns different responses in sequence.

    Alternates between PM decisions and Agent responses.
    """

    def __init__(self, responses: list[str]) -> None:
        self.responses = responses
        self.call_index = 0

    def chat(self, system: str, messages: list[dict[str, str]]) -> str:
        """Return next response in sequence."""
        if self.call_index < len(self.responses):
            response = self.responses[self.call_index]
            self.call_index += 1
            return response
        return json.dumps({
            "analysis": "Forced finish.",
            "next_action": "FINISH",
            "final_report": "# Fallback Report",
        })


class TestRunMeeting:
    """Tests for run_meeting function."""

    def test_simple_meeting_flow(self) -> None:
        """Meeting calls one agent then finishes."""
        responses = [
            # Round 1: PM calls architect
            json.dumps({
                "analysis": "Need architect input.",
                "next_action": "CALL_AGENT",
                "target_agent": "architect",
                "prompt_for_agent": "What architecture?",
            }),
            # Round 1: Architect responds
            "Use microservices for scalability.",
            # Round 2: PM finishes
            json.dumps({
                "analysis": "Got architect input.",
                "next_action": "FINISH",
                "final_report": "# Meeting Report\n\nArchitect recommends microservices.",
            }),
        ]
        mock_client = SequentialMockClient(responses)

        report = run_meeting(
            topic="Database migration",
            agent_configs=SAMPLE_AGENTS,
            client=mock_client,
            max_rounds=5,
        )

        assert "Meeting Report" in report
        assert "microservices" in report

    def test_max_rounds_forces_finish(self) -> None:
        """Meeting forces finish when max_rounds is reached."""
        # PM always calls agents, never finishes
        responses = []
        for i in range(10):
            responses.append(json.dumps({
                "analysis": f"Round {i}.",
                "next_action": "CALL_AGENT",
                "target_agent": "architect",
                "prompt_for_agent": "Thoughts?",
            }))
            responses.append("Some response.")

        # Final forced finish response from PM
        responses.append(json.dumps({
            "analysis": "Forced to finish.",
            "next_action": "FINISH",
            "final_report": "# Forced Report\n\nMax rounds reached.",
        }))

        mock_client = SequentialMockClient(responses)

        report = run_meeting(
            topic="Test",
            agent_configs=SAMPLE_AGENTS,
            client=mock_client,
            max_rounds=2,
        )

        assert "Forced Report" in report or "Report" in report

    def test_meeting_with_multiple_agents(self) -> None:
        """Meeting calls multiple agents before finishing."""
        responses = [
            # PM calls architect
            json.dumps({
                "analysis": "Start with architect.",
                "next_action": "CALL_AGENT",
                "target_agent": "architect",
                "prompt_for_agent": "Architecture thoughts?",
            }),
            "Use event-driven architecture.",
            # PM calls devops
            json.dumps({
                "analysis": "Now DevOps.",
                "next_action": "CALL_AGENT",
                "target_agent": "devops",
                "prompt_for_agent": "Deployment concerns?",
            }),
            "Need Kubernetes.",
            # PM finishes
            json.dumps({
                "analysis": "All covered.",
                "next_action": "FINISH",
                "final_report": "# Full Report\n\nArch: event-driven. DevOps: K8s.",
            }),
        ]
        mock_client = SequentialMockClient(responses)

        report = run_meeting(
            topic="New system",
            agent_configs=SAMPLE_AGENTS,
            client=mock_client,
            max_rounds=5,
        )

        assert "Full Report" in report


class TestSaveReport:
    """Tests for save_report function."""

    def test_save_report_creates_file(self, tmp_path: Path) -> None:
        """save_report writes Markdown to the output directory."""
        report = "# Test Report\n\nContent here."
        path = save_report(report, topic="test-topic", output_dir=tmp_path)

        assert path.exists()
        assert path.suffix == ".md"
        assert "test-topic" in path.name
        assert path.read_text(encoding="utf-8") == report
```

- [ ] **Step 2: Run tests to verify they fail**

Run: `python -m pytest tests/test_meeting.py -v`
Expected: FAIL with `ModuleNotFoundError`

- [ ] **Step 3: Implement meeting.py**

`src/meeting.py`:
```python
"""Meeting event loop - orchestrates the multi-agent discussion."""

from __future__ import annotations

import re
import sys
from datetime import datetime
from pathlib import Path
from typing import TYPE_CHECKING

from src.agent import format_scratchpad_summary, run_agent
from src.models import AgentConfig, MeetingState, ScratchpadEntry
from src.pm import run_pm

if TYPE_CHECKING:
    from src.llm_client import LLMClient

# ANSI color codes for terminal output
_COLORS = {
    "pm": "\033[1;34m",      # Bold blue
    "agent": "\033[1;32m",   # Bold green
    "system": "\033[1;33m",  # Bold yellow
    "reset": "\033[0m",
}


def _print_colored(label: str, message: str, color_key: str) -> None:
    """Print a colored message to the terminal.

    Args:
        label: The speaker label.
        message: The message content.
        color_key: Key into _COLORS dict.
    """
    color = _COLORS.get(color_key, "")
    reset = _COLORS["reset"]
    print(f"{color}[{label}]{reset} {message}")


def _slugify(text: str) -> str:
    """Convert text to a filename-safe slug.

    Args:
        text: Input text.

    Returns:
        Lowercase slug with hyphens.
    """
    slug = re.sub(r"[^\w\s-]", "", text.lower())
    slug = re.sub(r"[\s_]+", "-", slug)
    return slug[:50].strip("-")


def save_report(report: str, topic: str, output_dir: Path = Path("reports")) -> Path:
    """Save meeting report as a Markdown file.

    Args:
        report: The Markdown report content.
        topic: Meeting topic (used in filename).
        output_dir: Directory to save the report.

    Returns:
        Path to the saved report file.
    """
    output_dir.mkdir(parents=True, exist_ok=True)
    timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
    slug = _slugify(topic)
    filename = f"{timestamp}-{slug}.md"
    path = output_dir / filename
    path.write_text(report, encoding="utf-8")
    return path


def run_meeting(
    topic: str,
    agent_configs: dict[str, AgentConfig],
    client: LLMClient,
    max_rounds: int = 5,
    output_dir: Path = Path("reports"),
) -> str:
    """Execute the full meeting loop.

    1. Initialize MeetingState with topic.
    2. Loop: PM decides next action, Agent responds, whiteboard updates.
    3. Force finish if max_rounds reached.
    4. Return final Markdown report.

    Args:
        topic: The meeting discussion topic.
        agent_configs: Dict of agent name to AgentConfig.
        client: LLM client for inference.
        max_rounds: Maximum number of discussion rounds.
        output_dir: Directory to save the report.

    Returns:
        The final Markdown meeting report.
    """
    state = MeetingState(topic=topic, max_rounds=max_rounds)

    _print_colored("SYSTEM", f"Meeting started: {topic}", "system")
    _print_colored("SYSTEM", f"Participants: {', '.join(agent_configs.keys())}", "system")
    print()

    while state.current_round < state.max_rounds:
        state.current_round += 1
        _print_colored("SYSTEM", f"--- Round {state.current_round}/{state.max_rounds} ---", "system")

        # Check if this is the final forced round
        if state.current_round == state.max_rounds:
            # Add instruction to force finish on last round
            force_entry = ScratchpadEntry(
                agent_name="SYSTEM",
                content="MAX ROUNDS REACHED. You MUST set next_action to FINISH and provide a final_report.",
                timestamp=datetime.now(),
            )
            state.scratchpad.append(force_entry)

        # PM decides
        decision = run_pm(state, agent_configs, client)
        _print_colored("PM", f"Analysis: {decision.analysis}", "pm")

        if decision.next_action == "FINISH":
            _print_colored("PM", "Meeting concluded.", "pm")
            print()
            report = decision.final_report or "# Meeting Report\n\nNo report generated."
            return report

        # Call target agent
        target_name = decision.target_agent
        target_config = agent_configs[target_name]
        _print_colored("PM", f"Calling {target_name}: {decision.prompt_for_agent}", "pm")

        scratchpad_summary = format_scratchpad_summary(state.scratchpad)
        agent_response = run_agent(
            config=target_config,
            prompt=decision.prompt_for_agent,
            scratchpad_summary=scratchpad_summary,
            client=client,
        )

        _print_colored(target_name.upper(), agent_response, "agent")
        print()

        # Append to whiteboard
        entry = ScratchpadEntry(
            agent_name=target_name,
            content=agent_response,
            timestamp=datetime.now(),
        )
        state.scratchpad.append(entry)

    # Should not reach here normally, but safety fallback
    return "# Meeting Report\n\nMeeting ended after maximum rounds without PM conclusion."
```

- [ ] **Step 4: Run tests to verify they pass**

Run: `python -m pytest tests/test_meeting.py -v`
Expected: 4 passed

- [ ] **Step 5: Commit**

```bash
git add src/meeting.py tests/test_meeting.py
git commit -m "feat: add meeting event loop with terminal output and report saving"
```

---

### Task 8: Agent Builder CLI

**Files:**
- Create: `agent_builder.py`

- [ ] **Step 1: Write failing test**

Append to `tests/test_agent.py` (or create `tests/test_agent_builder.py` — we'll create a new file for CLI tests):

`tests/test_agent_builder.py`:
```python
"""Tests for agent_builder CLI."""

from __future__ import annotations

import json
from pathlib import Path
from unittest.mock import patch

from agent_builder import build_agent_config, AGENT_BUILDER_SYSTEM_PROMPT
from src.models import AgentConfig


class MockLLMClient:
    """Mock LLM client for agent builder."""

    def __init__(self, response: str) -> None:
        self.response = response

    def chat(self, system: str, messages: list[dict[str, str]]) -> str:
        return self.response


class TestBuildAgentConfig:
    """Tests for build_agent_config function."""

    def test_build_agent_config_returns_agent_config(self) -> None:
        """build_agent_config returns a valid AgentConfig."""
        llm_response = json.dumps({
            "role": "Chief Product Officer",
            "system_prompt": "You are a demanding product manager focused on simplicity.",
        })
        mock_client = MockLLMClient(response=llm_response)

        config = build_agent_config(
            name="steve_jobs",
            description="Mimics Steve Jobs' demanding product perspective",
            client=mock_client,
        )

        assert isinstance(config, AgentConfig)
        assert config.name == "steve_jobs"
        assert config.role == "Chief Product Officer"
        assert "simplicity" in config.system_prompt

    def test_build_agent_config_saves_json(self, tmp_path: Path) -> None:
        """build_agent_config saves config as JSON file."""
        llm_response = json.dumps({
            "role": "Architect",
            "system_prompt": "You are an architect.",
        })
        mock_client = MockLLMClient(response=llm_response)

        config = build_agent_config(
            name="arch",
            description="Software architect",
            client=mock_client,
            output_dir=tmp_path,
            fmt="json",
        )

        output_path = tmp_path / "arch.json"
        assert output_path.exists()
        loaded = json.loads(output_path.read_text(encoding="utf-8"))
        assert loaded["name"] == "arch"

    def test_build_agent_config_saves_yaml(self, tmp_path: Path) -> None:
        """build_agent_config saves config as YAML file."""
        import yaml

        llm_response = json.dumps({
            "role": "DevOps",
            "system_prompt": "You are DevOps.",
        })
        mock_client = MockLLMClient(response=llm_response)

        config = build_agent_config(
            name="devops",
            description="DevOps engineer",
            client=mock_client,
            output_dir=tmp_path,
            fmt="yaml",
        )

        output_path = tmp_path / "devops.yaml"
        assert output_path.exists()
        loaded = yaml.safe_load(output_path.read_text(encoding="utf-8"))
        assert loaded["name"] == "devops"
```

- [ ] **Step 2: Run tests to verify they fail**

Run: `python -m pytest tests/test_agent_builder.py -v`
Expected: FAIL with `ModuleNotFoundError`

- [ ] **Step 3: Implement agent_builder.py**

`agent_builder.py`:
```python
#!/usr/bin/env python3
"""CLI tool to generate Agent configuration files from natural language descriptions."""

from __future__ import annotations

import argparse
import json
import sys
from pathlib import Path
from typing import TYPE_CHECKING

import yaml

from src.llm_client import create_client
from src.models import AgentConfig

if TYPE_CHECKING:
    from src.llm_client import LLMClient

AGENT_BUILDER_SYSTEM_PROMPT = """You are an expert at creating AI agent personas.
Given a natural language description, generate a JSON object with exactly these fields:
- "role": A concise job title for the agent (e.g., "Chief Product Officer")
- "system_prompt": A detailed system prompt that captures the persona's expertise, communication style, and focus areas. The prompt should instruct the agent on how to behave in a meeting context.

Output ONLY the JSON object, no other text.

Example input: "Mimics Steve Jobs' demanding product perspective"
Example output:
{
  "role": "Chief Product Officer",
  "system_prompt": "You are an extremely demanding product manager who obsesses over simplicity and user experience. In meetings, you challenge every assumption, push for 10x better solutions, and reject anything that feels mediocre. You focus on: user delight, simplicity, integration of hardware and software thinking, and saying no to 1000 things."
}"""


def build_agent_config(
    name: str,
    description: str,
    client: LLMClient,
    output_dir: Path = Path("agents"),
    fmt: str = "json",
) -> AgentConfig:
    """Generate an AgentConfig from a natural language description.

    Args:
        name: Unique name for the agent.
        description: Natural language description of the agent persona.
        client: LLM client for inference.
        output_dir: Directory to save the config file.
        fmt: Output format, "json" or "yaml".

    Returns:
        The generated AgentConfig.
    """
    raw_response = client.chat(
        system=AGENT_BUILDER_SYSTEM_PROMPT,
        messages=[{"role": "user", "content": description}],
    )

    # Parse LLM response
    cleaned = raw_response.strip()
    if cleaned.startswith("```"):
        cleaned = cleaned.split("\n", 1)[1] if "\n" in cleaned else cleaned
        if cleaned.endswith("```"):
            cleaned = cleaned[:-3]
        cleaned = cleaned.strip()

    data = json.loads(cleaned)

    config = AgentConfig(
        name=name,
        role=data["role"],
        system_prompt=data["system_prompt"],
    )

    # Save to file
    output_dir.mkdir(parents=True, exist_ok=True)
    ext = "yaml" if fmt == "yaml" else "json"
    output_path = output_dir / f"{name}.{ext}"

    config_dict = config.model_dump()
    if fmt == "yaml":
        output_path.write_text(
            yaml.dump(config_dict, allow_unicode=True, default_flow_style=False),
            encoding="utf-8",
        )
    else:
        output_path.write_text(
            json.dumps(config_dict, ensure_ascii=False, indent=2),
            encoding="utf-8",
        )

    return config


def main() -> None:
    """CLI entry point for agent_builder."""
    parser = argparse.ArgumentParser(
        description="Generate an Agent config from a natural language description.",
    )
    parser.add_argument(
        "--description",
        required=True,
        help="Natural language description of the agent persona.",
    )
    parser.add_argument(
        "--name",
        required=True,
        help="Unique name for the agent.",
    )
    parser.add_argument(
        "--output-dir",
        default="agents",
        help="Directory to save the config file (default: agents/).",
    )
    parser.add_argument(
        "--format",
        choices=["json", "yaml"],
        default="json",
        help="Output format (default: json).",
    )
    args = parser.parse_args()

    client = create_client()
    config = build_agent_config(
        name=args.name,
        description=args.description,
        client=client,
        output_dir=Path(args.output_dir),
        fmt=args.format,
    )

    print(f"Agent '{config.name}' ({config.role}) saved to {args.output_dir}/{config.name}.{args.format}")


if __name__ == "__main__":
    main()
```

- [ ] **Step 4: Run tests to verify they pass**

Run: `python -m pytest tests/test_agent_builder.py -v`
Expected: 3 passed

- [ ] **Step 5: Commit**

```bash
git add agent_builder.py tests/test_agent_builder.py
git commit -m "feat: add agent_builder CLI for generating agent configs"
```

---

### Task 9: Run Meeting CLI

**Files:**
- Create: `run_meeting.py`

- [ ] **Step 1: Write failing test**

`tests/test_run_meeting_cli.py`:
```python
"""Tests for run_meeting CLI argument parsing."""

from __future__ import annotations

from run_meeting import parse_args


class TestParseArgs:
    """Tests for CLI argument parsing."""

    def test_parse_required_args(self) -> None:
        """Parse minimum required arguments."""
        args = parse_args(["--topic", "Test topic", "--agents", "arch,devops"])
        assert args.topic == "Test topic"
        assert args.agents == "arch,devops"

    def test_parse_optional_args(self) -> None:
        """Parse all optional arguments."""
        args = parse_args([
            "--topic", "Test",
            "--agents", "a,b",
            "--max-rounds", "10",
            "--agents-dir", "/custom/agents",
            "--output-dir", "/custom/reports",
        ])
        assert args.max_rounds == 10
        assert args.agents_dir == "/custom/agents"
        assert args.output_dir == "/custom/reports"

    def test_default_values(self) -> None:
        """Default values are set correctly."""
        args = parse_args(["--topic", "T", "--agents", "a"])
        assert args.max_rounds == 5
        assert args.agents_dir == "agents"
        assert args.output_dir == "reports"
```

- [ ] **Step 2: Run tests to verify they fail**

Run: `python -m pytest tests/test_run_meeting_cli.py -v`
Expected: FAIL with `ModuleNotFoundError`

- [ ] **Step 3: Implement run_meeting.py**

`run_meeting.py`:
```python
#!/usr/bin/env python3
"""CLI tool to run a multi-agent meeting."""

from __future__ import annotations

import argparse
import sys
from pathlib import Path

from src.config_loader import load_agents
from src.llm_client import create_client
from src.meeting import run_meeting, save_report


def parse_args(argv: list[str] | None = None) -> argparse.Namespace:
    """Parse command-line arguments.

    Args:
        argv: Argument list (defaults to sys.argv[1:]).

    Returns:
        Parsed arguments namespace.
    """
    parser = argparse.ArgumentParser(
        description="Run a multi-agent meeting on a given topic.",
    )
    parser.add_argument(
        "--topic",
        required=True,
        help="The discussion topic for the meeting.",
    )
    parser.add_argument(
        "--agents",
        required=True,
        help="Comma-separated list of agent names to participate.",
    )
    parser.add_argument(
        "--max-rounds",
        type=int,
        default=5,
        help="Maximum number of discussion rounds (default: 5).",
    )
    parser.add_argument(
        "--agents-dir",
        default="agents",
        help="Directory containing agent config files (default: agents/).",
    )
    parser.add_argument(
        "--output-dir",
        default="reports",
        help="Directory to save the meeting report (default: reports/).",
    )
    return parser.parse_args(argv)


def main() -> None:
    """CLI entry point for run_meeting."""
    args = parse_args()

    agent_names = [name.strip() for name in args.agents.split(",")]
    agents_dir = Path(args.agents_dir)
    output_dir = Path(args.output_dir)

    try:
        agent_configs = load_agents(agent_names, agents_dir=agents_dir)
    except FileNotFoundError as exc:
        print(f"Error: {exc}", file=sys.stderr)
        sys.exit(1)

    client = create_client()

    report = run_meeting(
        topic=args.topic,
        agent_configs=agent_configs,
        client=client,
        max_rounds=args.max_rounds,
        output_dir=output_dir,
    )

    # Save report
    report_path = save_report(report, topic=args.topic, output_dir=output_dir)
    print(f"\nReport saved to: {report_path}")


if __name__ == "__main__":
    main()
```

- [ ] **Step 4: Run tests to verify they pass**

Run: `python -m pytest tests/test_run_meeting_cli.py -v`
Expected: 3 passed

- [ ] **Step 5: Commit**

```bash
git add run_meeting.py tests/test_run_meeting_cli.py
git commit -m "feat: add run_meeting CLI for orchestrating meetings"
```

---

### Task 10: README and Example Agent Configs

**Files:**
- Create: `README.md`
- Create: `agents/example_architect.json`

- [ ] **Step 1: Create example agent config**

`agents/example_architect.json`:
```json
{
  "name": "architect",
  "role": "Software Architect",
  "system_prompt": "You are a senior software architect with 15 years of experience. You focus on system scalability, maintainability, and technical debt management. In meetings, you analyze proposals from an architectural perspective, identify potential bottlenecks, suggest design patterns, and evaluate trade-offs between different approaches. You communicate clearly and back your recommendations with concrete technical reasoning."
}
```

- [ ] **Step 2: Create README.md**

`README.md`:
```markdown
# Multi-Agent Meeting Framework

A lightweight, Unix-style multi-agent collaboration system in pure Python. No heavy frameworks — just `pydantic`, an LLM SDK, and clean code.

## Quick Start

### 1. Install dependencies

```bash
pip install -r requirements.txt
```

### 2. Set your API key

```bash
# For Anthropic (default)
export ANTHROPIC_API_KEY="your-key-here"

# Or for OpenAI
export LLM_PROVIDER=openai
export OPENAI_API_KEY="your-key-here"
```

### 3. Create agents

```bash
python agent_builder.py --description "资深软件架构师，关注系统可扩展性和技术债务" --name architect
python agent_builder.py --description "商业分析师，关注 ROI 和市场可行性" --name business_analyst
python agent_builder.py --description "DevOps 工程师，关注部署复杂度和运维成本" --name devops
```

### 4. Run a meeting

```bash
python run_meeting.py --topic "评估从 PostgreSQL 迁移到 MongoDB 的方案" --agents architect,business_analyst,devops
```

## CLI Reference

### agent_builder.py

Generate an Agent config from a natural language description.

```
python agent_builder.py --description "..." --name NAME [--output-dir agents/] [--format json|yaml]
```

### run_meeting.py

Run a multi-agent meeting.

```
python run_meeting.py --topic "..." --agents name1,name2 [--max-rounds 5] [--agents-dir agents/] [--output-dir reports/]
```

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `LLM_PROVIDER` | `anthropic` | LLM backend: `anthropic` or `openai` |
| `LLM_MODEL` | auto | Model name (defaults: `claude-sonnet-4-20250514` / `gpt-4o`) |
| `ANTHROPIC_API_KEY` | — | Anthropic API key |
| `OPENAI_API_KEY` | — | OpenAI API key |

## Architecture

- **Stateless Agents**: Each agent receives only the PM's question and a whiteboard summary
- **PM Router**: A state-machine PM agent outputs structured JSON decisions
- **Shared Whiteboard**: All context lives in a `MeetingState` Pydantic model
- **Hard Loop Limit**: `max_rounds` prevents infinite discussions

## Running Tests

```bash
python -m pytest tests/ -v
```
```

- [ ] **Step 3: Commit**

```bash
git add README.md agents/example_architect.json
git commit -m "docs: add README and example agent config"
```

---

### Task 11: Full Integration Test

**Files:**
- Modify: `tests/test_meeting.py`

- [ ] **Step 1: Run the full test suite**

Run: `python -m pytest tests/ -v`
Expected: All tests pass

- [ ] **Step 2: Fix any issues found**

If any tests fail, fix the root cause and re-run.

- [ ] **Step 3: Final commit (if fixes were needed)**

```bash
git add -A
git commit -m "fix: resolve integration test issues"
```