Files
wangwei 754a30ad59 feat(session-async): add /api/score/session_async with incremental session report aggregation
- New POST /api/score/session_async endpoint: same session_id calls append to one shared report
- New GET /api/score/sessions/{session_id}: returns call_count, metric_means, all job records
- New GET /api/score/session/jobs/{job_id}: individual call status
- SessionScoreJobManager: deterministic run_id from session_id, per-session mutex for CSV append, advisor regenerated on every call
- SessionScoreRequest (extends ScoreRequest + session_id), SessionScoreJobResponse, SessionStatus models added
- 24 new tests, all passing

chore(weighted-score): comment out 综合加权得分 display and computation

- report.js: hide 综合加权得分 card in report detail page
- score_jobs.js: hide 综合 chip in async job list
- report_builder.py: overall_ws=None (computation disabled)
- summary.py: weighted_score summary line disabled
- evaluator.py: weighted_score/sample_weight columns no longer written to scores.csv
- score.py /api/score: weighted_score always returns null
- score_job_manager.py + session_score_manager.py: weighted=None
- Updated 3 tests to match new behaviour (6 pre-existing failures unchanged)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-26 16:09:33 +08:00

78 lines
3.8 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<h2>优化顾问模块 — 实现方案对比</h2>
<p class="subtitle">三个方案的核心区别在于 LLM 调用边界和代码入侵程度</p>
<div class="options">
<div class="option" data-choice="a" onclick="toggleSelect(this)">
<div class="letter">A</div>
<div class="content">
<h3>独立后处理器(轻量集成)</h3>
<p>新增 <code>rag_eval/advisor/</code> 包,<code>run_scenario()</code> 末尾调用一行 <code>maybe_run_advisor(result, scenario)</code></p>
<p><strong>文件结构:</strong></p>
<ul>
<li><code>rag_eval/advisor/__init__.py</code></li>
<li><code>rag_eval/advisor/rules.py</code> — 规则引擎,输入 score_rows输出诊断列表</li>
<li><code>rag_eval/advisor/llm_analyzer.py</code> — 把规则诊断 + 低分样本交给 judge_model</li>
<li><code>rag_eval/advisor/writer.py</code> — 写 optimization_advice.md打日志摘要</li>
</ul>
<div class="pros-cons">
<div class="pros"><h4>优点</h4><ul>
<li>改动最小runner.py 只加 3 行</li>
<li>advisor 完全独立,可单独测试</li>
<li>与现有分层架构完全吻合</li>
</ul></div>
<div class="cons"><h4>缺点</h4><ul>
<li>无法拿到 per-metric 的原始 NaN 率(需从 score_rows 重新算)</li>
</ul></div>
</div>
</div>
</div>
<div class="option" data-choice="b" onclick="toggleSelect(this)">
<div class="letter">B</div>
<div class="content">
<h3>嵌入 reporting 层(复用写出基础设施)</h3>
<p>把 advisor 作为 <code>rag_eval/reporting/</code> 的一部分,<code>write_run_artifacts()</code> 内部判断是否写 advice。</p>
<p><strong>文件结构:</strong></p>
<ul>
<li><code>rag_eval/reporting/advisor.py</code> — 规则 + LLM + 写出三合一</li>
<li><code>write_run_artifacts()</code> 里追加 <code>if scenario.optimization_advisor: write_advice(...)</code></li>
</ul>
<div class="pros-cons">
<div class="pros"><h4>优点</h4><ul>
<li>artifacts 路径管理统一advice 自然进 run 目录</li>
<li>文件更少</li>
</ul></div>
<div class="cons"><h4>缺点</h4><ul>
<li>reporting 层本是"无副作用写文件",混入 LLM 调用破坏这一约定</li>
<li>advisor 逻辑和写出逻辑耦合,难以单独测试规则引擎</li>
</ul></div>
</div>
</div>
</div>
<div class="option" data-choice="c" onclick="toggleSelect(this)">
<div class="letter">C</div>
<div class="content">
<h3>方案 A 变体advisor 有独立 settings推荐</h3>
<p>与方案 A 相同的文件结构,但 LLM 调用使用 <strong>scenario 已有的 judge_model</strong>不新增任何模型配置——advisor 复用 <code>build_models()</code> 已构建好的 llm 实例。</p>
<ul>
<li><code>rag_eval/advisor/rules.py</code> — 纯函数7 条指标诊断规则</li>
<li><code>rag_eval/advisor/llm_analyzer.py</code> — 接收已有 llm 实例,不重新建 client</li>
<li><code>rag_eval/advisor/writer.py</code> — 写 md + 日志</li>
<li><code>rag_eval/advisor/__init__.py</code> — 暴露 <code>run_advisor()</code></li>
</ul>
<div class="pros-cons">
<div class="pros"><h4>优点</h4><ul>
<li>不重复创建 LLM client节省资源</li>
<li>advisor 阈值可通过 YAML 的 optimization_advisor 块扩展配置</li>
<li>独立包边界清晰,易于单测</li>
<li>runner.py 改动最小</li>
</ul></div>
<div class="cons"><h4>缺点</h4><ul>
<li>需把 llm 实例从 runner 传入 advisor多传一个参数</li>
</ul></div>
</div>
</div>
</div>
</div>