feat(session-async): add /api/score/session_async with incremental session report aggregation
- New POST /api/score/session_async endpoint: same session_id calls append to one shared report
- New GET /api/score/sessions/{session_id}: returns call_count, metric_means, all job records
- New GET /api/score/session/jobs/{job_id}: individual call status
- SessionScoreJobManager: deterministic run_id from session_id, per-session mutex for CSV append, advisor regenerated on every call
- SessionScoreRequest (extends ScoreRequest + session_id), SessionScoreJobResponse, SessionStatus models added
- 24 new tests, all passing
chore(weighted-score): comment out 综合加权得分 display and computation
- report.js: hide 综合加权得分 card in report detail page
- score_jobs.js: hide 综合 chip in async job list
- report_builder.py: overall_ws=None (computation disabled)
- summary.py: weighted_score summary line disabled
- evaluator.py: weighted_score/sample_weight columns no longer written to scores.csv
- score.py /api/score: weighted_score always returns null
- score_job_manager.py + session_score_manager.py: weighted=None
- Updated 3 tests to match new behaviour (6 pre-existing failures unchanged)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -0,0 +1,60 @@
|
||||
<h2>优化建议怎么生成?</h2>
|
||||
<p class="subtitle">这决定了模块的核心机制与可维护性</p>
|
||||
|
||||
<div class="options">
|
||||
<div class="option" data-choice="a" onclick="toggleSelect(this)">
|
||||
<div class="letter">A</div>
|
||||
<div class="content">
|
||||
<h3>纯规则引擎</h3>
|
||||
<p>每个指标设阈值(如 faithfulness < 0.6),触发时给出预设建议文本。</p>
|
||||
<div class="pros-cons">
|
||||
<div class="pros"><h4>优点</h4><ul>
|
||||
<li>零 LLM 调用,零额外成本</li>
|
||||
<li>结果可预测、可审计</li>
|
||||
<li>响应极快</li>
|
||||
</ul></div>
|
||||
<div class="cons"><h4>缺点</h4><ul>
|
||||
<li>建议固定,无法结合具体样本</li>
|
||||
<li>不能解释"为什么这批数据这个指标低"</li>
|
||||
</ul></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="option" data-choice="b" onclick="toggleSelect(this)">
|
||||
<div class="letter">B</div>
|
||||
<div class="content">
|
||||
<h3>LLM 分析(全自动)</h3>
|
||||
<p>把评测结果(各指标均值 + 低分样本)一起交给 LLM,生成上下文感知的中文分析报告。</p>
|
||||
<div class="pros-cons">
|
||||
<div class="pros"><h4>优点</h4><ul>
|
||||
<li>能结合具体低分样本给出针对性建议</li>
|
||||
<li>可用中文解释西门子场景下的问题</li>
|
||||
<li>建议质量高、内容丰富</li>
|
||||
</ul></div>
|
||||
<div class="cons"><h4>缺点</h4><ul>
|
||||
<li>每次评测多 1 次 LLM 调用</li>
|
||||
<li>依赖 judge_model 的质量</li>
|
||||
</ul></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="option" data-choice="c" onclick="toggleSelect(this)">
|
||||
<div class="letter">C</div>
|
||||
<div class="content">
|
||||
<h3>规则定位 + LLM 解读(推荐)</h3>
|
||||
<p>规则引擎先识别哪些指标异常、触发哪条优化方向;再把"规则诊断 + 低分样本"一起给 LLM 做二次解读,生成中文建议。</p>
|
||||
<div class="pros-cons">
|
||||
<div class="pros"><h4>优点</h4><ul>
|
||||
<li>规则保证诊断稳定,不依赖 LLM 自由发挥</li>
|
||||
<li>LLM 在有结构的输入下输出更准确</li>
|
||||
<li>两层可独立测试</li>
|
||||
</ul></div>
|
||||
<div class="cons"><h4>缺点</h4><ul>
|
||||
<li>实现略复杂(两个子模块)</li>
|
||||
</ul></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
Reference in New Issue
Block a user