Files
siemens_ragas/.superpowers/brainstorm/1625-1781595805/content/analysis-approach.html
wangwei 754a30ad59 feat(session-async): add /api/score/session_async with incremental session report aggregation
- New POST /api/score/session_async endpoint: same session_id calls append to one shared report
- New GET /api/score/sessions/{session_id}: returns call_count, metric_means, all job records
- New GET /api/score/session/jobs/{job_id}: individual call status
- SessionScoreJobManager: deterministic run_id from session_id, per-session mutex for CSV append, advisor regenerated on every call
- SessionScoreRequest (extends ScoreRequest + session_id), SessionScoreJobResponse, SessionStatus models added
- 24 new tests, all passing

chore(weighted-score): comment out 综合加权得分 display and computation

- report.js: hide 综合加权得分 card in report detail page
- score_jobs.js: hide 综合 chip in async job list
- report_builder.py: overall_ws=None (computation disabled)
- summary.py: weighted_score summary line disabled
- evaluator.py: weighted_score/sample_weight columns no longer written to scores.csv
- score.py /api/score: weighted_score always returns null
- score_job_manager.py + session_score_manager.py: weighted=None
- Updated 3 tests to match new behaviour (6 pre-existing failures unchanged)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-26 16:09:33 +08:00

61 lines
2.4 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<h2>优化建议怎么生成?</h2>
<p class="subtitle">这决定了模块的核心机制与可维护性</p>
<div class="options">
<div class="option" data-choice="a" onclick="toggleSelect(this)">
<div class="letter">A</div>
<div class="content">
<h3>纯规则引擎</h3>
<p>每个指标设阈值(如 faithfulness &lt; 0.6),触发时给出预设建议文本。</p>
<div class="pros-cons">
<div class="pros"><h4>优点</h4><ul>
<li>零 LLM 调用,零额外成本</li>
<li>结果可预测、可审计</li>
<li>响应极快</li>
</ul></div>
<div class="cons"><h4>缺点</h4><ul>
<li>建议固定,无法结合具体样本</li>
<li>不能解释"为什么这批数据这个指标低"</li>
</ul></div>
</div>
</div>
</div>
<div class="option" data-choice="b" onclick="toggleSelect(this)">
<div class="letter">B</div>
<div class="content">
<h3>LLM 分析(全自动)</h3>
<p>把评测结果(各指标均值 + 低分样本)一起交给 LLM生成上下文感知的中文分析报告。</p>
<div class="pros-cons">
<div class="pros"><h4>优点</h4><ul>
<li>能结合具体低分样本给出针对性建议</li>
<li>可用中文解释西门子场景下的问题</li>
<li>建议质量高、内容丰富</li>
</ul></div>
<div class="cons"><h4>缺点</h4><ul>
<li>每次评测多 1 次 LLM 调用</li>
<li>依赖 judge_model 的质量</li>
</ul></div>
</div>
</div>
</div>
<div class="option" data-choice="c" onclick="toggleSelect(this)">
<div class="letter">C</div>
<div class="content">
<h3>规则定位 + LLM 解读(推荐)</h3>
<p>规则引擎先识别哪些指标异常、触发哪条优化方向;再把"规则诊断 + 低分样本"一起给 LLM 做二次解读,生成中文建议。</p>
<div class="pros-cons">
<div class="pros"><h4>优点</h4><ul>
<li>规则保证诊断稳定,不依赖 LLM 自由发挥</li>
<li>LLM 在有结构的输入下输出更准确</li>
<li>两层可独立测试</li>
</ul></div>
<div class="cons"><h4>缺点</h4><ul>
<li>实现略复杂(两个子模块)</li>
</ul></div>
</div>
</div>
</div>
</div>