# Project Memory: ai_usecases_explorer

## Status
**v1.4 fully implemented** — 162 tests pass, ruff clean. Deep Dive module enhanced with citation, bias, credibility breakdown, and synthesis features.

## Project Layout
```
src/ai_usecases_explorer/
  models/usecase.py       — Pydantic UseCase, ScenarioType (StrEnum), Novelty (StrEnum), make_id()
  storage/database.py     — SQLite CRUD: save/exists/get_by_scenario/get_recent
  collectors/base.py      — RawItem dataclass + BaseCollector ABC
  collectors/hackernews.py — HN Algolia search_by_date, 12 keywords (one query each), dedup
  collectors/reddit.py    — Exa.ai SDK (exa_py), include_domains=["reddit.com"], past 24h, 4 queries
  collectors/github.py    — GitHub Search API, 8 topics (one query each), dedup
  processors/filter.py    — ValueFilter (Claude Step 1)
  processors/classifier.py — Classifier (Claude Step 2)
  processors/deduplicator.py — Deduplicator (Claude Step 3)
  processors/utils.py     — parse_llm_json() strips markdown fences before JSON parse
  reporters/obsidian.py   — Markdown report writer
  deep_dive/              — Deep Dive skill (on-demand analysis)
    models.py             — ArticleContent, StructuralAnalysis (+ LogicalQuality), ResearchResults (FactCheck + source_authority), MultiPerspectiveEvaluation (+ cited args, bias indicators, credibility breakdown, core disputes, synthesis, assessed_fact_checks), DeepDiveReport
    fetcher.py            — ContentFetcher: fetch_url(httpx+Exa fallback) + from_text()
    analyzer.py           — StructuralAnalyzer: Claude, extracts argument structure (all Chinese output)
    researcher.py         — WebResearcher: Exa.ai 4-category search (no autoprompt param)
    evaluator.py          — MultiPerspectiveEvaluator: Claude, pro/con/neutral + credibility (all Chinese)
    reporter.py           — DeepDiveReporter: Markdown report → Obsidian deep-dive subdir
    cli.py                — DeepDiveOrchestrator + argparse CLI; prints REPORT_PATH: line for parsing
  settings.py             — pydantic-settings; deep_dive_report_dir + deep_dive_model fields
  main.py                 — Scout orchestrator class + main() CLI
.claude/commands/deep-dive.md — /deep-dive slash command (saves to Obsidian by default)
.claude/commands/ai-scout.md  — /ai-scout slash command (daily scan)
SKILLS.md                — Full usage guide for all skills (Chinese)
tests/                   — TDD tests for all modules (pytest-httpx mocks + unittest.mock for Exa)
```

## CLI Entry Points (no venv activation needed)
- `ai-scout` — daily scan (registered in pyproject.toml + /usr/local/bin wrapper)
- `ai-deep-dive` — deep dive analysis (registered in pyproject.toml + /usr/local/bin wrapper)

## Key Architectural Decisions
- Each LLM call has `run()` and `_call_llm()` methods; tests mock `_call_llm` at the class level
- `Novelty` and `ScenarioType` enum mapping from string → enum happens in `run()`, not `_call_llm()`
- Deduplicator skips LLM call when `existing=[]` (returns NEW directly)
- Algolia ANDs multi-word queries: use ONE query per keyword (_KEYWORDS list, 12 terms)
- GitHub Search API: `topic:X OR topic:Y` causes 422; use ONE query per topic (_TOPICS, 8 topics)
- Reddit replaced from OAuth2 to Exa.ai SDK (exa_py v2.6.0+); tests use `unittest.mock.patch`
- Haiku wraps JSON in markdown fences despite instructions; parse_llm_json() strips them
- `max_items_per_run=80` caps LLM calls per run (configurable via MAX_ITEMS_PER_RUN env)
- `ruff` per-file-ignores: `tests/*` + `deep_dive/evaluator.py` + `deep_dive/analyzer.py` ignore E501 (long Chinese prompt strings)
- deep_dive: analyzer + evaluator prompts have explicit 【语言要求】Chinese-output instruction
- deep_dive: fetcher falls back to Exa.ai get_contents() on 403/4xx (Cloudflare sites)
- deep_dive: Exa search_and_contents does NOT support `autoprompt` param — omit it
- deep_dive: cli prints `REPORT_PATH:<path>` for slash command to parse
- deep_dive model configured via `DEEP_DIVE_MODEL` env var (default: claude-sonnet-4-6)

## Running
```bash
# Tests
source venv/bin/activate && pytest tests/ -v   # 162 tests

# Daily scout
ai-scout
ai-scout --dry-run

# Deep dive
ai-deep-dive --url https://...
ai-deep-dive --url https://... --dry-run
```

## Required .env keys
- `ANTHROPIC_API_KEY` + optionally `ANTHROPIC_BASE_URL` for proxy
- `EXA_API_KEY` — Exa.ai API key for Reddit search + Deep Dive research
- `GITHUB_TOKEN` optional (increases rate limit from 60 to 5000 req/hr)
- `DEEP_DIVE_MODEL` optional (default: claude-sonnet-4-6; set to haiku if proxy overloaded)
- `DEEP_DIVE_REPORT_DIR` optional (default: auto_report/deep-dive/)

## Obsidian Report Paths
- Daily: `/root/vault/obsidian_vault/obsidian/Documents/obsidian/auto_report/YYYY-MM-DD.md`
- Deep Dive: `/root/vault/obsidian_vault/obsidian/Documents/obsidian/auto_report/deep-dive/YYYY-MM-DD-HHMM-slug.md`

## Cron
```
0 7 * * * /root/projects/ai_usecases_explorer/venv/bin/python -m ai_usecases_explorer.main >> /var/log/ai_usecases_scout.log 2>&1
```
