feat(session-async): add /api/score/session_async with incremental session report aggregation

- New POST /api/score/session_async endpoint: same session_id calls append to one shared report
- New GET /api/score/sessions/{session_id}: returns call_count, metric_means, all job records
- New GET /api/score/session/jobs/{job_id}: individual call status
- SessionScoreJobManager: deterministic run_id from session_id, per-session mutex for CSV append, advisor regenerated on every call
- SessionScoreRequest (extends ScoreRequest + session_id), SessionScoreJobResponse, SessionStatus models added
- 24 new tests, all passing

chore(weighted-score): comment out 综合加权得分 display and computation

- report.js: hide 综合加权得分 card in report detail page
- score_jobs.js: hide 综合 chip in async job list
- report_builder.py: overall_ws=None (computation disabled)
- summary.py: weighted_score summary line disabled
- evaluator.py: weighted_score/sample_weight columns no longer written to scores.csv
- score.py /api/score: weighted_score always returns null
- score_job_manager.py + session_score_manager.py: weighted=None
- Updated 3 tests to match new behaviour (6 pre-existing failures unchanged)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
2026-06-26 16:09:33 +08:00
parent e1751447df
commit 754a30ad59
36 changed files with 2004 additions and 51 deletions

View File

@@ -156,10 +156,11 @@ def score_sample(
all_scores: dict[str, float | None] = {metric_name: None for metric_name in request.metrics}
all_scores.update(raw_scores)
weighted = compute_weighted_score(
{key: value for key, value in raw_scores.items() if value is not None},
{},
)
# 综合加权得分计算(已暂时禁用)
# weighted = compute_weighted_score(
# {key: value for key, value in raw_scores.items() if value is not None},
# {},
# )
logger.info(
"[score] done latency=%dms skipped=%s scores=%s",
@@ -169,7 +170,7 @@ def score_sample(
)
return ScoreResponse(
scores=all_scores,
weighted_score=round(weighted, 4) if weighted is not None else None,
weighted_score=None, # 综合加权得分已暂时禁用
latency_ms=latency_ms,
skipped_metrics=skipped,
)