# Butler 测试文档

本文档描述了 Butler 项目的所有测试工具、使用方式和最佳实践。

---

## 📋 目录

1. [测试工具概览](#测试工具概览)
2. [单元测试 (Pytest)](#单元测试-pytest)
3. [烟雾测试 (Smoke Test)](#烟雾测试-smoke-test)
4. [交互测试 (Quick Test)](#交互测试-quick-test)
5. [查询场景测试 (Query Scenarios)](#查询场景测试-query-scenarios)
6. [数据安全指南](#数据安全指南)
7. [最佳实践](#最佳实践)
8. [故障排查](#故障排查)

---

## 测试工具概览

| 测试文件 | 类型 | 数据安全 | 用途 | API调用 |
|---------|------|---------|------|---------|
| `tests/test_garmin_client_new_features.py` | 单元测试 | ✅ 安全 | Garmin客户端新功能测试 | ❌ Mock |
| `smoke_test.py` | 集成测试 | ⚠️ 需隔离 | 快速检查组件导入和工具注册 | ✅ 真实 |
| `quick_test.py` | 交互测试 | ⚠️ 需隔离 | 测试Gemini对话和工具调用 | ✅ 真实 |
| `tests/test_query_scenarios.py` | 场景测试 | ⚠️ 需隔离 | 18个真实用户查询场景 | ✅ 真实 |

---

## 单元测试 (Pytest)

### 📍 文件位置
```
tests/test_garmin_client_new_features.py
```

### 🎯 测试目标
测试 Garmin Health Client 的新功能实现：
- SpO2 血氧饱和度数据获取
- Respiration 呼吸频率数据
- Hydration 水分摄入数据
- Floors 楼层爬升数据
- Intensity Minutes 运动强度分钟数
- HRV 心率变异性数据
- 错误处理（未认证、API错误）

### ✅ 数据安全性
**完全安全** - 使用 `unittest.mock.Mock` 模拟所有 API 调用，不会：
- 调用真实 Garmin API
- 写入任何数据文件
- 产生任何副作用

### 🚀 运行方式

```bash
# 运行所有测试
pytest tests/test_garmin_client_new_features.py -v

# 运行特定测试类
pytest tests/test_garmin_client_new_features.py::TestFetchSpO2 -v

# 运行特定测试方法
pytest tests/test_garmin_client_new_features.py::TestFetchSpO2::test_fetch_spo2_success -v

# 显示详细输出
pytest tests/test_garmin_client_new_features.py -vv -s

# 生成覆盖率报告
pytest tests/test_garmin_client_new_features.py --cov=health.services.garmin_client
```

### 📊 测试用例列表

#### TestFetchSpO2 (3个测试)
- ✅ `test_fetch_spo2_success` - 成功获取SpO2数据
- ✅ `test_fetch_spo2_no_data` - 无数据情况
- ✅ `test_fetch_spo2_empty_response` - 空响应处理

#### TestFetchRespiration (2个测试)
- ✅ `test_fetch_respiration_success` - 成功获取呼吸数据
- ✅ `test_fetch_respiration_no_data` - 无数据情况

#### TestFetchHydration (2个测试)
- ✅ `test_fetch_hydration_success` - 成功获取水分数据
- ✅ `test_fetch_hydration_no_data` - 无数据情况

#### TestFetchFloors (2个测试)
- ✅ `test_fetch_floors_success` - 成功获取楼层数据
- ✅ `test_fetch_floors_no_data` - 无数据情况

#### TestFetchIntensityMinutes (2个测试)
- ✅ `test_fetch_intensity_minutes_success` - 成功获取强度数据
- ✅ `test_fetch_intensity_minutes_no_data` - 无数据情况

#### TestFetchHRV (3个测试)
- ✅ `test_fetch_hrv_success` - 成功获取HRV数据
- ✅ `test_fetch_hrv_no_baseline` - 无基线数据
- ✅ `test_fetch_hrv_no_data` - 无数据情况

#### TestErrorHandling (3个测试)
- ✅ `test_unauthenticated_client_spo2` - SpO2未认证错误
- ✅ `test_unauthenticated_client_hrv` - HRV未认证错误
- ✅ `test_api_error_handling_floors` - API错误处理

### 💡 使用场景
- ✅ 开发新功能后验证逻辑正确性
- ✅ 重构代码后回归测试
- ✅ CI/CD 流程中的自动化测试
- ✅ 代码审查前的验证

---

## 烟雾测试 (Smoke Test)

### 📍 文件位置
```
smoke_test.py
```

### 🎯 测试目标
快速验证系统关键组件是否正常工作：
1. 模块导入检查
2. 工具注册验证
3. 上下文存储功能
4. Gemini LLM 初始化
5. 工具执行（`get_daily_summary`）
6. Dispatcher 初始化

### ⚠️ 数据安全性
**需要隔离** - 会执行真实工具调用：
- ✅ 使用 `test_channel` 作为上下文存储
- ✅ 测试结束后会调用 `storage.clear()`
- ⚠️ **Line 60: 调用真实的 `get_daily_summary` 工具**
- ⚠️ 可能会读取/写入数据目录

### 🚀 运行方式

```bash
# 推荐：使用隔离的数据目录
export DATA_DIR="/tmp/butler_smoke_test"
python smoke_test.py

# 检查输出
# 预期看到所有 ✅ 标记
```

### 📊 测试步骤

```
1. Testing imports...
   ✅ All imports successful

2. Testing tool registry...
   Available tools: ['get_daily_detailed_stats', 'get_metric_history', ...]
   Tool schemas: 21 defined
   ✅ Tools registered correctly

3. Testing context storage...
   ✅ Context storage works

4. Testing Gemini initialization...
   Model: gemini-2.0-flash-exp
   ✅ Gemini initialized successfully

5. Testing tool execution...
   get_daily_summary result: ...
   ✅ Tool execution works

6. Testing dispatcher...
   ✅ Dispatcher initialized

🎉 All smoke tests passed!
```

### 💡 使用场景
- ✅ 环境配置检查（首次部署后）
- ✅ 依赖安装验证
- ✅ 快速健康检查（2-5秒完成）
- ❌ 不适合频繁运行（会调用真实工具）

---

## 交互测试 (Quick Test)

### 📍 文件位置
```
quick_test.py
```

### 🎯 测试目标
测试 Gemini LLM 的实际对话和工具调用能力：
1. 简单对话测试（"你好"）
2. 带工具的健康数据查询
3. 工具执行和结果展示

### ⚠️ 数据安全性
**需要隔离** - 会产生真实 API 调用和数据操作：
- ⚠️ 调用真实 Gemini API（产生费用）
- ⚠️ 执行真实工具函数
- ⚠️ 可能写入数据到 `DATA_DIR`

### 🚀 运行方式

```bash
# 推荐：设置隔离环境
export DATA_DIR="/tmp/butler_quick_test"
export GEMINI_MODEL="gemini-2.0-flash-exp"
export GEMINI_BASE_URL="http://localhost:8000/v1"  # 如果使用代理

python quick_test.py
```

### 📊 测试流程

```
================================================================
🤖 Gemini Bot Quick Test
================================================================
Model: gemini-2.0-flash-exp
Proxy: http://localhost:8000/v1
================================================================

📝 Test 1: Simple greeting
User: 你好
Gemini: 你好！有什么可以帮你的吗？

📝 Test 2: Query health data
User: 帮我查询2024-01-15的健康数据
🛠️ Tool calls: 1
   - get_daily_detailed_stats({'target_date': '2024-01-15'})
   → Result: {...}

Gemini: 根据数据显示...

================================================================
✅ Quick test complete!
================================================================
```

### 💡 使用场景
- ✅ 验证 Gemini API 连通性
- ✅ 测试新工具是否正确注册
- ✅ 快速验证对话逻辑
- ⚠️ 需要手动检查输出质量

---

## 查询场景测试 (Query Scenarios)

### 📍 文件位置
```
tests/test_query_scenarios.py
```

### 🎯 测试目标
基于真实用户问题的 18 个综合查询场景测试，覆盖：
- 单日查询（今天/昨天数据）
- 时间范围查询（2周/1月/1年趋势）
- 多指标查询（睡眠+运动）
- 因果分析（酒精影响、补剂效果）
- 高级分析（恢复指标、健身趋势）
- 手动日志查询（饮食、饮酒记录）
- 同步操作（sync + query）

### ⚠️ 数据安全性
**需要隔离** - 与 `quick_test.py` 相同：
- ⚠️ 调用真实 Gemini API
- ⚠️ 执行真实工具函数
- ⚠️ 18 个场景约产生 18-36 次 API 调用

### 🚀 运行方式

```bash
# 1. 列出所有场景（安全，不执行）
python tests/test_query_scenarios.py --list

# 2. 运行所有场景
export DATA_DIR="/tmp/butler_test_query"
python tests/test_query_scenarios.py

# 3. 运行特定场景
python tests/test_query_scenarios.py --scenario daily_summary

# 4. 保存测试结果
python tests/test_query_scenarios.py > test_results_$(date +%Y%m%d).log 2>&1

# 5. 只看摘要（过滤详细输出）
python tests/test_query_scenarios.py 2>&1 | grep -E "^(✅|⚠️|📊)"
```

### 📊 测试场景列表（18个）

#### Category 1: 单日查询 (3个)
```
1.  daily_summary              - 今天的健康数据怎么样？
2.  yesterday_sleep_analysis   - 分析昨晚睡眠数据
3.  todays_workout_analysis    - 今早椭圆机运动请深入分析
```

#### Category 2: 时间范围查询 (4个)
```
4.  two_week_sleep_trend       - 分析下最近两周的睡眠情况
5.  yearly_trend               - 过去一年的历史趋势呢
6.  monthly_rhr_trend          - 最近一个月的静息心率变化
7.  hrv_90_days                - 最近3个月的HRV趋势
```

#### Category 3: 多指标查询 (2个)
```
8.  sleep_and_workout          - 深入分析一下昨晚的睡眠和今早的椭圆机
9.  comprehensive_today        - 帮我查询今天的健康数据和运动记录
```

#### Category 4: 因果分析 (3个)
```
10. carb_impact_analysis       - 确实好像没有碳水容易胸闷，请分析底层原因
11. alcohol_sleep_correlation  - 喝酒对我的睡眠有什么影响？
12. supplement_effectiveness   - 镁补充剂对我的睡眠有帮助吗？
```

#### Category 5: 高级分析 (2个)
```
13. recovery_insights          - 最近30天的恢复指标怎么样？
14. fitness_trends             - 我的健身趋势如何，有进步吗？
```

#### Category 6: 手动日志查询 (3个)
```
15. diet_history               - 最近一周我都吃了什么？
16. alcohol_consumption        - 过去两周的饮酒记录
17. all_logs_week              - 上周的所有手动记录
```

#### Category 7: 同步操作 (1个)
```
18. sync_then_query            - 先同步Garmin数据，然后查询今天的数据
```

### 📈 输出示例

```bash
================================================================================
🧪 Scenario: sleep_and_workout
📝 Description: Combined sleep + workout analysis
💬 User Query: 深入分析一下昨晚的睡眠和今早的椭圆机
🔧 Expected Tools: get_daily_detailed_stats, get_activity_history
================================================================================

🛠️  Tools Called: ['get_daily_detailed_stats', 'get_activity_history']

  [1] Executing: get_daily_detailed_stats({'target_date': '2024-01-15'})
      ✅ Result: {...}

  [2] Executing: get_activity_history({'start_date': '2024-01-15', ...})
      ✅ Result: {...}

🤖 Gemini Response:
根据数据分析，昨晚您的睡眠质量为...

✅ Test PASSED - All expected tools called
```

**最终摘要：**
```
================================================================================
📊 Test Summary
================================================================================
✅ PASS  daily_summary               → get_daily_detailed_stats
✅ PASS  sleep_and_workout           → get_daily_detailed_stats, get_activity_history
⚠️  PARTIAL  carb_impact_analysis    → get_manual_history
================================================================================
✅ Passed: 16/18
⚠️  Partial: 2/18
================================================================================
```

### 💡 使用场景
- ✅ 验证工具调用逻辑是否正确
- ✅ 回归测试（更新工具后）
- ✅ 评估 LLM 工具选择准确性
- ✅ 记录用户常见查询模式

### 🔧 扩展测试场景

在 `SCENARIOS` 列表中添加新场景：

```python
SCENARIOS.append(
    QueryTestScenario(
        name="custom_scenario_name",
        user_query="用户的实际问题",
        expected_tools=["tool_name_1", "tool_name_2"],
        description="场景描述",
        context=[]  # 可选：提供对话历史上下文
    )
)
```

---

## 数据安全指南

### ✅ 安全级别分类

| 级别 | 说明 | 适用测试 |
|------|------|---------|
| 🟢 **完全安全** | 无数据读写，无API调用 | Pytest单元测试 |
| 🟡 **需要隔离** | 有数据读写或API调用 | Smoke/Quick/Scenarios |
| 🔴 **生产环境** | 使用真实数据目录 | ⚠️ 仅手动调试时 |

### 🛡️ 数据隔离最佳实践

#### 方法 1: 临时数据目录（推荐）

```bash
# 为每次测试创建独立目录
export DATA_DIR="/tmp/butler_test_$(date +%s)"
python tests/test_query_scenarios.py

# 测试完成后检查
ls -la $DATA_DIR

# 清理（可选）
rm -rf /tmp/butler_test_*
```

#### 方法 2: 使用 pytest fixture（单元测试）

```python
@pytest.fixture(scope="function")
def isolated_test_data(monkeypatch, tmp_path):
    """自动隔离测试数据"""
    test_dir = tmp_path / "butler_test"
    test_dir.mkdir()
    monkeypatch.setenv("DATA_DIR", str(test_dir))
    return test_dir

def test_something(isolated_test_data):
    # 自动使用临时目录
    assert os.getenv("DATA_DIR") == str(isolated_test_data)
```

#### 方法 3: Docker 容器隔离（高级）

```bash
# 创建测试容器
docker run -it --rm \
  -v $(pwd):/app \
  -e DATA_DIR=/tmp/test_data \
  -e GEMINI_API_KEY=$GEMINI_API_KEY \
  python:3.12 \
  bash -c "cd /app && python tests/test_query_scenarios.py"
```

### ⚠️ 危险操作清单

**永远不要这样做：**
```bash
# ❌ 在生产数据目录运行真实工具调用测试
unset DATA_DIR  # 使用默认配置目录
python quick_test.py  # 可能污染生产数据

# ❌ 未设置隔离环境就运行场景测试
python tests/test_query_scenarios.py  # 18个场景可能写入数据
```

**安全做法：**
```bash
# ✅ 总是设置临时目录
export DATA_DIR="/tmp/butler_test"
python tests/test_query_scenarios.py

# ✅ 使用只读工具（不执行写入操作的场景）
python tests/test_query_scenarios.py --scenario daily_summary
```

---

## 最佳实践

### 🎯 测试策略

#### 开发阶段
```bash
# 1. 快速验证（每次代码修改后）
pytest tests/test_garmin_client_new_features.py -v

# 2. 组件检查（新增依赖后）
export DATA_DIR="/tmp/test"
python smoke_test.py

# 3. 功能验证（开发新工具后）
python tests/test_query_scenarios.py --scenario daily_summary
```

#### 提交前
```bash
# 1. 运行所有单元测试
pytest tests/ -v

# 2. 运行关键场景测试
export DATA_DIR="/tmp/butler_test"
python tests/test_query_scenarios.py --scenario sleep_and_workout
python tests/test_query_scenarios.py --scenario yearly_trend
```

#### CI/CD 流程
```bash
#!/bin/bash
# .github/workflows/test.yml

# 只运行单元测试（快速、安全）
pytest tests/test_*.py -v --cov=health --cov-report=xml

# 烟雾测试（验证环境）
export DATA_DIR="/tmp/ci_test"
python smoke_test.py || exit 1
```

### 📝 添加新测试的检查清单

- [ ] 测试是否需要真实API调用？
  - 是 → 使用 `test_query_scenarios.py` 或创建新的集成测试
  - 否 → 使用 pytest + mock

- [ ] 测试是否会写入数据？
  - 是 → 确保文档中标注 ⚠️ 需要隔离
  - 否 → 标注 ✅ 安全

- [ ] 测试数据如何隔离？
  - 使用 `DATA_DIR` 环境变量
  - 使用 pytest fixture
  - 使用 unique channel ID

- [ ] 测试是否幂等？
  - 多次运行不应产生不同结果
  - 清理测试数据

- [ ] 是否添加到文档？
  - 更新本文档 `TESTING.md`
  - 添加使用示例

### 🔄 测试维护

**每月检查：**
```bash
# 运行所有测试，确保没有回归
pytest tests/ -v
export DATA_DIR="/tmp/monthly_test"
python tests/test_query_scenarios.py > monthly_test_$(date +%Y%m).log
```

**更新场景：**
```bash
# 从日志中提取新的用户问题
grep "Received message" logs/health.log | tail -20

# 添加到 SCENARIOS 列表
vim tests/test_query_scenarios.py
```

---

## 故障排查

### 问题 1: Pytest 导入失败

**症状：**
```
ModuleNotFoundError: No module named 'health'
```

**解决：**
```bash
# 确保在项目根目录
cd /Users/lili/workspace/butler

# 激活虚拟环境
source venv/bin/activate

# 检查 PYTHONPATH
export PYTHONPATH=$(pwd):$PYTHONPATH

# 重新运行
pytest tests/test_garmin_client_new_features.py -v
```

### 问题 2: Gemini API 调用失败

**症状：**
```
Error: API key not found
```

**解决：**
```bash
# 检查 .env 文件
cat .env | grep GEMINI

# 确保加载环境变量
export $(cat .env | grep -v '^#' | xargs)

# 或手动设置
export GEMINI_API_KEY="your_api_key_here"
export GEMINI_MODEL="gemini-2.0-flash-exp"
```

### 问题 3: 工具执行返回空结果

**症状：**
```
Tool execution (expected if no data): No data found for date 2024-01-15
```

**原因：**
测试数据库为空，没有历史数据。

**解决：**
```bash
# 选项 1: 使用包含测试数据的目录
export DATA_DIR="/path/to/test/data"

# 选项 2: 先同步数据（如果可以）
python -m health.sync.garmin_sync --date 2024-01-15

# 选项 3: 创建模拟数据（开发环境）
python tools/create_test_data.py
```

### 问题 4: 测试污染生产数据

**症状：**
```
# 测试后发现生产目录有测试数据
ls ~/Library/Application Support/Butler/contexts/
# context_test_channel.json  ← 测试文件
```

**解决：**
```bash
# 立即停止测试

# 清理测试数据
rm -f ~/Library/Application\ Support/Butler/contexts/context_test_*.json

# 下次必须设置隔离环境
export DATA_DIR="/tmp/butler_test"
```

**预防：**
```bash
# 在 .bashrc 或 .zshrc 添加别名
alias butler-test="export DATA_DIR=/tmp/butler_test_$(date +%s) && echo 'Test DATA_DIR: $DATA_DIR'"

# 使用时
butler-test
python tests/test_query_scenarios.py
```

### 问题 5: 场景测试 PARTIAL 失败

**症状：**
```
⚠️  PARTIAL  carb_impact_analysis    → get_manual_history
```

**原因：**
LLM 选择了不同的工具组合，但仍然可能完成任务。

**分析：**
```bash
# 查看详细输出
python tests/test_query_scenarios.py --scenario carb_impact_analysis

# 检查：
# 1. LLM 是否理解了问题？
# 2. 调用的工具是否合理？
# 3. 是否需要更新 expected_tools？
```

**修复（如果需要）：**
```python
# 如果新工具组合更合理，更新期望
QueryTestScenario(
    name="carb_impact_analysis",
    expected_tools=["get_manual_history", "get_metric_history"],  # 更新
    ...
)
```

---

## 附录

### A. 测试环境配置

**推荐的 .env.test 文件：**
```bash
# Gemini API
GEMINI_API_KEY=your_test_api_key_here
GEMINI_MODEL=gemini-2.0-flash-exp
GEMINI_BASE_URL=http://localhost:8000/v1  # 可选，本地代理

# 数据目录（测试专用）
DATA_DIR=/tmp/butler_test

# Slack（测试token）
SLACK_BOT_TOKEN=xoxb-test-token
SLACK_SIGNING_SECRET=test-secret

# Garmin（测试账号）
GARMIN_EMAIL=test@example.com
GARMIN_PASSWORD=test_password
```

**使用：**
```bash
# 加载测试环境
source .env.test
python tests/test_query_scenarios.py
```

### B. 快速命令参考

```bash
# ========== 单元测试 ==========
pytest tests/test_garmin_client_new_features.py -v          # 所有测试
pytest tests/test_garmin_client_new_features.py::TestFetchSpO2 -v  # 特定类
pytest tests/test_garmin_client_new_features.py -vv -s      # 详细输出

# ========== 烟雾测试 ==========
export DATA_DIR=/tmp/test && python smoke_test.py          # 隔离运行

# ========== 交互测试 ==========
export DATA_DIR=/tmp/test && python quick_test.py          # 隔离运行

# ========== 场景测试 ==========
python tests/test_query_scenarios.py --list                # 列出场景
python tests/test_query_scenarios.py --scenario daily_summary  # 单个场景
export DATA_DIR=/tmp/test && python tests/test_query_scenarios.py  # 全部场景
python tests/test_query_scenarios.py > results.log 2>&1    # 保存结果

# ========== 清理 ==========
rm -rf /tmp/butler_test*                                    # 清理测试数据
unset DATA_DIR                                              # 重置环境变量
```

### C. 测试覆盖率目标

| 模块 | 当前覆盖率 | 目标覆盖率 |
|------|-----------|-----------|
| `health.services.garmin_client` | ~80% | 90% |
| `slack_bot.tools.health_read` | ~60% | 80% |
| `slack_bot.tools.analytics` | ~50% | 75% |
| `slack_bot.dispatcher` | ~70% | 85% |

**查看覆盖率：**
```bash
pytest tests/ --cov=health --cov=slack_bot --cov-report=html
open htmlcov/index.html
```

---

## 总结

| 测试工具 | 运行时间 | API费用 | 使用频率 | 推荐场景 |
|---------|---------|---------|---------|---------|
| Pytest单元测试 | 5-10秒 | 免费 | 每次提交 | 开发、CI/CD |
| 烟雾测试 | 2-5秒 | ~$0.001 | 环境变化时 | 部署后检查 |
| 交互测试 | 10-20秒 | ~$0.002 | 调试时 | API验证 |
| 场景测试 | 2-5分钟 | ~$0.05 | 周/月度 | 回归测试 |

**核心原则：**
1. ✅ 优先使用单元测试（快速、安全、免费）
2. ⚠️ 隔离真实工具调用测试（使用临时 DATA_DIR）
3. 📊 定期运行场景测试（验证实际用户体验）
4. 🔄 持续更新测试用例（基于真实用户问题）

---

*Last Updated: 2026-02-05*
*Maintainer: Butler Team*