# Multi-Agent Framework Implementation Plan

> **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Build a lightweight Python multi-agent collaboration framework with an agent builder CLI, a PM-led meeting orchestrator CLI, deterministic shared state, controlled context loading, and Markdown report output.

**Architecture:** Keep the system as two thin CLIs over a small `core/` runtime. Use Pydantic models as the contract boundary, a provider-agnostic LLM adapter for `openai` and `anthropic`, deterministic context and whiteboard merging, and a single-threaded event loop with explicit finish and fallback paths.

**Tech Stack:** Python standard library, `pydantic`, `PyYAML`, `openai`, `anthropic`, `pytest`

---

## File Structure

- Create: `core/__init__.py`
- Create: `core/models.py`
- Create: `core/serialization.py`
- Create: `core/agent_loader.py`
- Create: `core/meeting_loader.py`
- Create: `core/agent_builder.py`
- Create: `core/context_loader.py`
- Create: `core/llm.py`
- Create: `core/meeting.py`
- Create: `agent_builder.py`
- Create: `run_meeting.py`
- Create: `requirements.txt`
- Create: `README.md`
- Create: `agents/arch.yaml`
- Create: `agents/biz.yaml`
- Create: `agents/pm.yaml`
- Create: `meetings/demo.yaml`
- Create: `reports/.gitkeep`
- Create: `tests/test_models.py`
- Create: `tests/test_serialization.py`
- Create: `tests/test_agent_loader.py`
- Create: `tests/test_meeting_loader.py`
- Create: `tests/test_context_loader.py`
- Create: `tests/test_llm.py`
- Create: `tests/test_meeting.py`
- Create: `tests/test_agent_builder_cli.py`
- Create: `tests/test_run_meeting_cli.py`
- Create: `tests/test_demo_assets.py`

## Chunk 1: Models And Serialization

### Task 1: Define Pydantic models

**Files:**
- Create: `core/models.py`
- Test: `tests/test_models.py`

- [ ] **Step 1: Write the failing test for `AgentConfig` validation**

```python
import pytest

from core.models import AgentConfig


def test_agent_config_rejects_invalid_name() -> None:
    with pytest.raises(ValueError):
        AgentConfig(
            name="bad name",
            role="Architect",
            description="desc",
            system_prompt="prompt",
            input_schema_description="in",
            output_schema_description="out",
            input_schema={"type": "object"},
            output_schema={"type": "object", "properties": {"response": {"type": "string"}}},
        )
```

- [ ] **Step 2: Run test to verify it fails**

Run: `pytest tests/test_models.py::test_agent_config_rejects_invalid_name -v`
Expected: FAIL because `AgentConfig` does not exist yet

- [ ] **Step 3: Write minimal model implementation**

Implement `AgentConfig` and the supporting validators in `core/models.py`.

- [ ] **Step 4: Run test to verify it passes**

Run: `pytest tests/test_models.py::test_agent_config_rejects_invalid_name -v`
Expected: PASS

- [ ] **Step 5: Add remaining model tests**

Add tests for:
- `PMDecision` conditional fields
- `MeetingInput` topic required
- `ContextSource` positive limits
- `DocumentCitation` consistency fields
- `MeetingState` loop count semantics

- [ ] **Step 6: Implement remaining models**

Implement:
- `MeetingBrief`
- `DecisionPacket`
- `ContextSource`
- `MeetingInput`
- `ContextDocument`
- `ContextBundle`
- `DocumentCitation`
- `AgentTurnResult`
- `PMDecision`
- `MeetingState`

- [ ] **Step 7: Run full model test file**

Run: `pytest tests/test_models.py -v`
Expected: PASS

### Task 2: Add JSON and YAML serialization helpers

**Files:**
- Create: `core/serialization.py`
- Test: `tests/test_serialization.py`

- [ ] **Step 1: Write the failing round-trip test**

```python
from pathlib import Path

from core.models import AgentConfig
from core.serialization import dump_data, load_data


def test_yaml_round_trip(tmp_path: Path) -> None:
    path = tmp_path / "agent.yaml"
    config = AgentConfig(
        name="arch",
        role="Architect",
        description="desc",
        system_prompt="prompt",
        input_schema_description="in",
        output_schema_description="out",
        input_schema={"type": "object"},
        output_schema={
            "type": "object",
            "properties": {"response": {"type": "string"}},
            "required": ["response"],
        },
    )

    dump_data(path, config.model_dump())

    loaded = load_data(path)

    assert loaded["name"] == "arch"
```

- [ ] **Step 2: Run test to verify it fails**

Run: `pytest tests/test_serialization.py::test_yaml_round_trip -v`
Expected: FAIL because helpers do not exist yet

- [ ] **Step 3: Implement minimal helpers**

Implement `load_data(path: Path) -> dict` and `dump_data(path: Path, payload: dict) -> None` with suffix-based JSON/YAML support.

- [ ] **Step 4: Run serialization test file**

Run: `pytest tests/test_serialization.py -v`
Expected: PASS

## Chunk 2: Config Loading And Context Collection

### Task 3: Implement agent config loader

**Files:**
- Create: `core/agent_loader.py`
- Test: `tests/test_agent_loader.py`

- [ ] **Step 1: Write the failing loader test**

```python
from pathlib import Path

from core.agent_loader import load_agent_config


def test_load_agent_config_from_yaml(tmp_path: Path) -> None:
    path = tmp_path / "arch.yaml"
    path.write_text(
        """
name: arch
role: Architect
description: desc
system_prompt: prompt
input_schema_description: in
output_schema_description: out
input_schema: {type: object}
output_schema:
  type: object
  properties:
    response: {type: string}
  required: [response]
""".strip(),
        encoding="utf-8",
    )

    config = load_agent_config(path)

    assert config.name == "arch"
```

- [ ] **Step 2: Run test to verify it fails**

Run: `pytest tests/test_agent_loader.py::test_load_agent_config_from_yaml -v`
Expected: FAIL because loader does not exist yet

- [ ] **Step 3: Add remaining failing loader tests before implementation**

Add tests for:
- unknown file extension
- missing file
- incompatible `output_schema`

- [ ] **Step 4: Run the loader tests to verify they fail**

Run: `pytest tests/test_agent_loader.py -v`
Expected: FAIL because loader and compatibility checks do not exist yet

- [ ] **Step 5: Implement loader and schema compatibility check**

Implement:
- `load_agent_config(path: Path) -> AgentConfig`
- `load_named_agents(agent_names: list[str], agents_dir: Path) -> dict[str, AgentConfig]`
- runtime compatibility validation for `AgentTurnResult`

- [ ] **Step 6: Run full agent loader tests**

Run: `pytest tests/test_agent_loader.py -v`
Expected: PASS

### Task 4: Implement meeting-file loader

**Files:**
- Create: `core/meeting_loader.py`
- Test: `tests/test_meeting_loader.py`

- [ ] **Step 1: Write the failing YAML meeting-file test**

```python
from pathlib import Path

from core.meeting_loader import load_meeting_input


def test_load_meeting_input_from_yaml(tmp_path: Path) -> None:
    path = tmp_path / "meeting.yaml"
    path.write_text(
        """
topic: test topic
brief: {}
decision_packet:
  decision_to_make: decide
""".strip(),
        encoding="utf-8",
    )

    meeting_input = load_meeting_input(path)

    assert meeting_input.topic == "test topic"
```

- [ ] **Step 2: Write the remaining failing meeting-file tests**

Add tests for:
- valid JSON meeting file
- missing topic
- invalid `decision_packet`
- typed `MeetingInput` return value

- [ ] **Step 3: Run meeting-file tests to verify they fail**

Run: `pytest tests/test_meeting_loader.py -v`
Expected: FAIL because meeting loader does not exist yet

- [ ] **Step 4: Implement minimal meeting loader**

Implement `load_meeting_input(path: Path) -> MeetingInput` using `core.serialization`.

- [ ] **Step 5: Run meeting-file tests to verify they pass**

Run: `pytest tests/test_meeting_loader.py -v`
Expected: PASS

### Task 5: Implement deterministic context loader

**Files:**
- Create: `core/context_loader.py`
- Test: `tests/test_context_loader.py`

- [ ] **Step 1: Write the failing deterministic ordering test**

```python
from pathlib import Path

from core.context_loader import build_context_bundle
from core.models import ContextSource, DecisionPacket, MeetingBrief, MeetingInput


def test_directory_sources_are_sorted_before_truncation(tmp_path: Path) -> None:
    docs_dir = tmp_path / "docs"
    docs_dir.mkdir()
    (docs_dir / "b.md").write_text("B", encoding="utf-8")
    (docs_dir / "a.md").write_text("A", encoding="utf-8")

    meeting_input = MeetingInput(
        topic="topic",
        brief=MeetingBrief(),
        decision_packet=DecisionPacket(decision_to_make="topic"),
        context_sources=[
            ContextSource(
                type="directory",
                path="./docs",
                purpose="docs",
                include=["**/*.md"],
                max_files=1,
            )
        ],
    )

    bundle = build_context_bundle(meeting_input, tmp_path / "meeting.yaml")

    assert bundle.documents[0].source_path.endswith("a.md")
```

- [ ] **Step 2: Run test to verify it fails**

Run: `pytest tests/test_context_loader.py::test_directory_sources_are_sorted_before_truncation -v`
Expected: FAIL because context loader does not exist yet

- [ ] **Step 3: Write the remaining failing edge-case tests before implementation**

Add tests for:
- binary file skip
- unreadable file skip
- symlink skip
- UTF-8 replacement behavior
- skip reasons recorded

- [ ] **Step 4: Run full context loader tests to verify they fail**

Run: `pytest tests/test_context_loader.py -v`
Expected: FAIL because edge-case handling does not exist yet

- [ ] **Step 5: Implement minimal context loading**

Implement deterministic file and directory collection with:
- path resolution relative to `meeting-file`
- sorted traversal before truncation
- default excluded directories
- char truncation
- stable `document_id`

- [ ] **Step 6: Run full context loader tests**

Run: `pytest tests/test_context_loader.py -v`
Expected: PASS

## Chunk 3: LLM Adapter And Meeting Runtime

### Task 6: Implement provider-agnostic LLM adapter

**Files:**
- Create: `core/llm.py`
- Modify: `core/models.py` if a request model helps typing
- Test: `tests/test_llm.py`

- [ ] **Step 1: Write the failing fake-client integration test**

```python
from core.llm import LLMClient


def test_fake_llm_client_returns_structured_payload() -> None:
    client = LLMClient.from_fake_responses([
        {"next_action": "FINISH", "analysis": "done", "final_report": "# Report"}
    ])

    result = client.generate_structured_response(
        provider="openai",
        model="fake",
        system_prompt="prompt",
        user_prompt="prompt",
    )

    assert result["next_action"] == "FINISH"
```

- [ ] **Step 2: Run test to verify it fails**

Run: `pytest tests/test_llm.py::test_fake_llm_client_returns_structured_payload -v`
Expected: FAIL because adapter does not exist yet

- [ ] **Step 3: Implement minimal adapter surface**

Implement:
- `LLMClient` protocol or class
- fake-response test double support
- `generate_structured_response`
- `generate_text_response`
- provider dispatch for `openai` and `anthropic`
- API key lookup from environment

- [ ] **Step 4: Run the focused test**

Run: `pytest tests/test_llm.py::test_fake_llm_client_returns_structured_payload -v`
Expected: PASS

### Task 7: Implement meeting runtime and fallback logic

**Files:**
- Create: `core/meeting.py`
- Test: `tests/test_meeting.py`

- [ ] **Step 1: Write the failing meeting completion test**

```python
from pathlib import Path

from core.meeting import run_meeting
from core.models import AgentConfig, DecisionPacket, MeetingBrief, MeetingInput


def test_run_meeting_finishes_with_report(tmp_path: Path) -> None:
    agents = {
        "pm": AgentConfig(
            name="pm",
            role="PM",
            description="desc",
            system_prompt="prompt",
            input_schema_description="in",
            output_schema_description="out",
            input_schema={"type": "object"},
            output_schema={
                "type": "object",
                "properties": {"response": {"type": "string"}},
                "required": ["response"],
            },
        )
    }
    meeting_input = MeetingInput(
        topic="topic",
        brief=MeetingBrief(),
        decision_packet=DecisionPacket(decision_to_make="topic"),
    )

    result = run_meeting(
        meeting_input=meeting_input,
        agents=agents,
        provider="openai",
        model="fake",
        max_loops=3,
        llm_client=None,
    )

    assert result.final_report.startswith("#")
```

- [ ] **Step 2: Run test to verify it fails**

Run: `pytest tests/test_meeting.py::test_run_meeting_finishes_with_report -v`
Expected: FAIL because runtime does not exist yet

- [ ] **Step 3: Implement minimal event loop**

Implement:
- state initialization
- PM decision request
- normal `FINISH` path
- report return object

- [ ] **Step 4: Add failing tests for all control paths before further implementation**

Add tests for:
- normal `CALL_AGENT` then `FINISH`
- agent retry after invalid output
- PM invalid JSON retry
- PM routing error limit
- PM-only retries do not increment `loop_count`
- `max_loops` forced finish path
- PM forced-finish success path
- local fallback report generation
- fallback failure sets `failed`
- deterministic whiteboard merge
- required report sections in normal path
- required report sections in fallback path
- `## Process Note` in forced-stop output

- [ ] **Step 5: Run full meeting tests to verify they fail**

Run: `pytest tests/test_meeting.py -v`
Expected: FAIL because control-path behavior is not fully implemented yet

- [ ] **Step 6: Implement remaining runtime behavior**

Implement:
- PM retry limits
- target agent invocation and retry
- whiteboard merge rules
- forced finish mode
- fallback report builder
- final status semantics

- [ ] **Step 7: Run full meeting tests**

Run: `pytest tests/test_meeting.py -v`
Expected: PASS

## Chunk 4: CLI Entrypoints, Fixtures, And Docs

### Task 8: Implement `core/agent_builder.py` and `agent_builder.py`

**Files:**
- Create: `core/agent_builder.py`
- Create: `agent_builder.py`
- Test: `tests/test_agent_builder_cli.py`

- [ ] **Step 1: Write the failing CLI test**

```python
from pathlib import Path
import subprocess
import sys


def test_agent_builder_writes_yaml(tmp_path: Path) -> None:
    output_path = tmp_path / "arch.yaml"

    result = subprocess.run(
        [
            sys.executable,
            "agent_builder.py",
            "--description",
            "strict architect",
            "--name",
            "arch",
            "--role",
            "Architect",
            "--provider",
            "openai",
            "--model",
            "fake",
            "--format",
            "yaml",
            "--output",
            str(output_path),
        ],
        capture_output=True,
        text=True,
        cwd=Path(__file__).resolve().parents[1],
    )

    assert result.returncode == 0
    assert output_path.exists()
```

- [ ] **Step 2: Run test to verify it fails**

Run: `pytest tests/test_agent_builder_cli.py::test_agent_builder_writes_yaml -v`
Expected: FAIL because CLI does not exist yet

- [ ] **Step 3: Add a failing core builder test before implementation**

Add a unit test for `build_agent_config(...) -> AgentConfig` that uses a fake LLM response and validates the returned model.

- [ ] **Step 4: Run builder tests to verify they fail**

Run: `pytest tests/test_agent_builder_cli.py -v`
Expected: FAIL because core builder and CLI do not exist yet

- [ ] **Step 5: Implement core builder and thin CLI**

Implement:
- `core/agent_builder.py` for prompt construction, LLM call, structured parsing, validation, and optional retry
- `agent_builder.py` only for `argparse`, invoking the core builder, and writing JSON/YAML output

- [ ] **Step 6: Run CLI tests**

Run: `pytest tests/test_agent_builder_cli.py -v`
Expected: PASS

### Task 9: Implement `run_meeting.py`

**Files:**
- Create: `run_meeting.py`
- Test: `tests/test_run_meeting_cli.py`

- [ ] **Step 1: Write the failing CLI test for topic mode**

```python
from pathlib import Path
import subprocess
import sys


def test_run_meeting_topic_mode_writes_report(tmp_path: Path) -> None:
    report_path = tmp_path / "report.md"

    result = subprocess.run(
        [
            sys.executable,
            "run_meeting.py",
            "--topic",
            "topic",
            "--agents",
            "pm",
            "--provider",
            "openai",
            "--model",
            "fake",
            "--max-loops",
            "2",
            "--report-file",
            str(report_path),
        ],
        capture_output=True,
        text=True,
        cwd=Path(__file__).resolve().parents[1],
    )

    assert result.returncode == 0
    assert report_path.exists()
```

- [ ] **Step 2: Run test to verify it fails**

Run: `pytest tests/test_run_meeting_cli.py::test_run_meeting_topic_mode_writes_report -v`
Expected: FAIL because CLI does not exist yet

- [ ] **Step 3: Add the failing meeting-file CLI test before implementation**

Add a second CLI test that uses a real `meeting-file` shape and validates report output.

- [ ] **Step 4: Run CLI tests to verify they fail**

Run: `pytest tests/test_run_meeting_cli.py -v`
Expected: FAIL because topic mode and meeting-file mode are not implemented yet

- [ ] **Step 5: Implement minimal CLI**

Implement:
- `argparse` contract
- `--meeting-file` vs `--topic` validation
- topic-mode `MeetingInput` synthesis
- `meeting-file` loading through `core.meeting_loader`
- agent loading
- meeting runtime call
- report write

- [ ] **Step 6: Run CLI tests**

Run: `pytest tests/test_run_meeting_cli.py -v`
Expected: PASS

### Task 10: Add sample fixtures, requirements, and README

**Files:**
- Create: `agents/arch.yaml`
- Create: `agents/biz.yaml`
- Create: `agents/pm.yaml`
- Create: `meetings/demo.yaml`
- Create: `requirements.txt`
- Create: `README.md`
- Create: `reports/.gitkeep`
- Test: `tests/test_demo_assets.py`

- [ ] **Step 1: Write the failing smoke test for demo assets**

```python
from pathlib import Path


def test_demo_assets_exist() -> None:
    root = Path(__file__).resolve().parents[1]

    assert (root / "agents" / "arch.yaml").exists()
    assert (root / "agents" / "biz.yaml").exists()
    assert (root / "agents" / "pm.yaml").exists()
    assert (root / "meetings" / "demo.yaml").exists()
    assert (root / "requirements.txt").exists()
    assert (root / "README.md").exists()
```

- [ ] **Step 2: Run the smoke test**

Run: `pytest tests/test_demo_assets.py::test_demo_assets_exist -v`
Expected: FAIL because fixtures and docs do not exist yet

- [ ] **Step 3: Create the demo assets and documentation**

Add:
- three example agents
- one example meeting file
- minimal dependency file
- minimal README with install, builder, and meeting demo commands

- [ ] **Step 4: Run the smoke test**

Run: `pytest tests/test_demo_assets.py::test_demo_assets_exist -v`
Expected: PASS

- [ ] **Step 5: Run the full test suite**

Run: `pytest tests -v`
Expected: PASS

- [ ] **Step 6: Manual demo verification**

Run: `python run_meeting.py --meeting-file meetings/demo.yaml --agents arch,biz,pm --provider openai --model fake --max-loops 5 --report-file reports/demo.md`
Expected: exit code 0 and `reports/demo.md` exists

## Execution Notes

- Implement in the same order as the chunks.
- Keep each task small and test-first.
- Do not add dependencies beyond the approved stack.
- Prefer focused helpers over large utility modules.
- Keep the CLIs thin; push logic into `core/`.
- If any task reveals a spec mismatch, update the spec first, then continue.

Plan complete and saved to `docs/superpowers/plans/2026-03-30-multi-agent-framework-implementation.md`. Ready to execute?
