Files
siemens_ragas/docs/superpowers/specs/2026-06-15-siemens-scenario-design.md
wangwei 75ae7927ad Add Siemens CT document evaluation scenario (three-step pipeline)
- scenarios/siemens_build/siemens-pdf-build.yaml: dataset build for all 17
  Siemens medical-imaging PDFs (aliyun_docmind parser, 10 questions/doc,
  failure_mode=skip, ~170 question total)
- scenarios/offline/siemens-pdf-offline-smoke.yaml: offline evaluation using
  source chunks as contexts and ground_truth as answer (up to 30 samples)
- scenarios/online/siemens-pdf-question-bank-online.yaml: online evaluation
  calling siemens_pdf_qa adapter, batch_size=4, up to 50 samples
- apps/siemens_pdf_qa/adapter.py: Siemens-specific adapter with bilingual
  (zh/en) system prompt and strict evidence-grounding for CT domain
- scripts/build_siemens_offline_smoke.py: helper to derive offline smoke CSV
  from completed dataset build artifacts (run after dataset build step)
- docs/superpowers/specs/2026-06-15-siemens-scenario-design.md: design spec

All three scenarios are automatically discovered by the web console.

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
2026-06-15 17:00:52 +08:00

60 lines
2.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Siemens PDF 场景设计 Spec
- 日期2026-06-15
- 状态:已确认,进入实现。
## 1. 目标
基于 `datasets/siemens-pdfs/`17 个西门子医疗 CT 中文 PDF跑通完整三步流水线
```
dataset_buildPDF→题库→ offline smoke 评估 → online 评估
```
完全镜像现有 `sample-pdf-*` 模式(方案 A不改动任何现有文件。
## 2. 参数决策
| 项目 | 值 |
|---|---|
| 输入 PDF | `datasets/siemens-pdfs/*.pdf`17 个) |
| failure_mode | `skip`(单个文档解析失败不中断整批) |
| max_questions_per_document | 10共 ~170 题) |
| max_source_chunks_per_question | 3 |
| generation model | `.env``DATASET_GENERATOR_MODEL`qwen3.6-plus |
| judge model | `.env``RAGAS_JUDGE_MODEL`deepseek-v4-flash |
| embedding model | `.env``RAGAS_EMBEDDING_MODEL`text-embedding-v3 |
| online answer model | `.env``RAGAS_JUDGE_MODEL` |
| metrics | faithfulness / answer_relevancy / context_recall / context_precision |
## 3. 新增文件4 个)
```
scenarios/siemens_build/siemens-pdf-build.yaml
scenarios/offline/siemens-pdf-offline-smoke.yaml
scenarios/online/siemens-pdf-question-bank-online.yaml
apps/siemens_pdf_qa/__init__.py
apps/siemens_pdf_qa/adapter.py
```
加上辅助脚本:
```
scripts/build_siemens_offline_smoke.py ← 从 build 产物生成 offline smoke CSV
```
## 4. 运行顺序
```
# 步骤 1dataset buildPDF → 题库草稿 + source_chunks.jsonl
python main.py --dataset-build-config scenarios/siemens_build/siemens-pdf-build.yaml
# 步骤 2生成 offline smoke 数据集一次性脚本build 跑完后执行)
python scripts/build_siemens_offline_smoke.py
# 步骤 3offline 评估(用 source chunks 作为 contextsground_truth 作为 answer
python main.py --scenario scenarios/offline/siemens-pdf-offline-smoke.yaml
# 步骤 4online 评估(实时调用 LLM 生成 answer再评分
python main.py --scenario scenarios/online/siemens-pdf-question-bank-online.yaml
```