- New POST /api/score/session_async endpoint: same session_id calls append to one shared report
- New GET /api/score/sessions/{session_id}: returns call_count, metric_means, all job records
- New GET /api/score/session/jobs/{job_id}: individual call status
- SessionScoreJobManager: deterministic run_id from session_id, per-session mutex for CSV append, advisor regenerated on every call
- SessionScoreRequest (extends ScoreRequest + session_id), SessionScoreJobResponse, SessionStatus models added
- 24 new tests, all passing
chore(weighted-score): comment out 综合加权得分 display and computation
- report.js: hide 综合加权得分 card in report detail page
- score_jobs.js: hide 综合 chip in async job list
- report_builder.py: overall_ws=None (computation disabled)
- summary.py: weighted_score summary line disabled
- evaluator.py: weighted_score/sample_weight columns no longer written to scores.csv
- score.py /api/score: weighted_score always returns null
- score_job_manager.py + session_score_manager.py: weighted=None
- Updated 3 tests to match new behaviour (6 pre-existing failures unchanged)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
78 lines
3.8 KiB
HTML
78 lines
3.8 KiB
HTML
<h2>优化顾问模块 — 实现方案对比</h2>
|
||
<p class="subtitle">三个方案的核心区别在于 LLM 调用边界和代码入侵程度</p>
|
||
|
||
<div class="options">
|
||
<div class="option" data-choice="a" onclick="toggleSelect(this)">
|
||
<div class="letter">A</div>
|
||
<div class="content">
|
||
<h3>独立后处理器(轻量集成)</h3>
|
||
<p>新增 <code>rag_eval/advisor/</code> 包,<code>run_scenario()</code> 末尾调用一行 <code>maybe_run_advisor(result, scenario)</code>。</p>
|
||
<p><strong>文件结构:</strong></p>
|
||
<ul>
|
||
<li><code>rag_eval/advisor/__init__.py</code></li>
|
||
<li><code>rag_eval/advisor/rules.py</code> — 规则引擎,输入 score_rows,输出诊断列表</li>
|
||
<li><code>rag_eval/advisor/llm_analyzer.py</code> — 把规则诊断 + 低分样本交给 judge_model</li>
|
||
<li><code>rag_eval/advisor/writer.py</code> — 写 optimization_advice.md,打日志摘要</li>
|
||
</ul>
|
||
<div class="pros-cons">
|
||
<div class="pros"><h4>优点</h4><ul>
|
||
<li>改动最小,runner.py 只加 3 行</li>
|
||
<li>advisor 完全独立,可单独测试</li>
|
||
<li>与现有分层架构完全吻合</li>
|
||
</ul></div>
|
||
<div class="cons"><h4>缺点</h4><ul>
|
||
<li>无法拿到 per-metric 的原始 NaN 率(需从 score_rows 重新算)</li>
|
||
</ul></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="option" data-choice="b" onclick="toggleSelect(this)">
|
||
<div class="letter">B</div>
|
||
<div class="content">
|
||
<h3>嵌入 reporting 层(复用写出基础设施)</h3>
|
||
<p>把 advisor 作为 <code>rag_eval/reporting/</code> 的一部分,<code>write_run_artifacts()</code> 内部判断是否写 advice。</p>
|
||
<p><strong>文件结构:</strong></p>
|
||
<ul>
|
||
<li><code>rag_eval/reporting/advisor.py</code> — 规则 + LLM + 写出三合一</li>
|
||
<li><code>write_run_artifacts()</code> 里追加 <code>if scenario.optimization_advisor: write_advice(...)</code></li>
|
||
</ul>
|
||
<div class="pros-cons">
|
||
<div class="pros"><h4>优点</h4><ul>
|
||
<li>artifacts 路径管理统一,advice 自然进 run 目录</li>
|
||
<li>文件更少</li>
|
||
</ul></div>
|
||
<div class="cons"><h4>缺点</h4><ul>
|
||
<li>reporting 层本是"无副作用写文件",混入 LLM 调用破坏这一约定</li>
|
||
<li>advisor 逻辑和写出逻辑耦合,难以单独测试规则引擎</li>
|
||
</ul></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="option" data-choice="c" onclick="toggleSelect(this)">
|
||
<div class="letter">C</div>
|
||
<div class="content">
|
||
<h3>方案 A 变体:advisor 有独立 settings(推荐)</h3>
|
||
<p>与方案 A 相同的文件结构,但 LLM 调用使用 <strong>scenario 已有的 judge_model</strong>,不新增任何模型配置——advisor 复用 <code>build_models()</code> 已构建好的 llm 实例。</p>
|
||
<ul>
|
||
<li><code>rag_eval/advisor/rules.py</code> — 纯函数,7 条指标诊断规则</li>
|
||
<li><code>rag_eval/advisor/llm_analyzer.py</code> — 接收已有 llm 实例,不重新建 client</li>
|
||
<li><code>rag_eval/advisor/writer.py</code> — 写 md + 日志</li>
|
||
<li><code>rag_eval/advisor/__init__.py</code> — 暴露 <code>run_advisor()</code></li>
|
||
</ul>
|
||
<div class="pros-cons">
|
||
<div class="pros"><h4>优点</h4><ul>
|
||
<li>不重复创建 LLM client(节省资源)</li>
|
||
<li>advisor 阈值可通过 YAML 的 optimization_advisor 块扩展配置</li>
|
||
<li>独立包边界清晰,易于单测</li>
|
||
<li>runner.py 改动最小</li>
|
||
</ul></div>
|
||
<div class="cons"><h4>缺点</h4><ul>
|
||
<li>需把 llm 实例从 runner 传入 advisor(多传一个参数)</li>
|
||
</ul></div>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
</div>
|