- New POST /api/score/session_async endpoint: same session_id calls append to one shared report
- New GET /api/score/sessions/{session_id}: returns call_count, metric_means, all job records
- New GET /api/score/session/jobs/{job_id}: individual call status
- SessionScoreJobManager: deterministic run_id from session_id, per-session mutex for CSV append, advisor regenerated on every call
- SessionScoreRequest (extends ScoreRequest + session_id), SessionScoreJobResponse, SessionStatus models added
- 24 new tests, all passing
chore(weighted-score): comment out 综合加权得分 display and computation
- report.js: hide 综合加权得分 card in report detail page
- score_jobs.js: hide 综合 chip in async job list
- report_builder.py: overall_ws=None (computation disabled)
- summary.py: weighted_score summary line disabled
- evaluator.py: weighted_score/sample_weight columns no longer written to scores.csv
- score.py /api/score: weighted_score always returns null
- score_job_manager.py + session_score_manager.py: weighted=None
- Updated 3 tests to match new behaviour (6 pre-existing failures unchanged)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each async score job:
- Runs InlineScorer.score() in thread pool
- Writes standard run artifacts (metadata.json, scores.csv, summary.md)
- Runs optimization_advisor => optimization_advice.md
- Result appears in 运行列表 and 报告详情 with full report
New endpoints:
- POST /api/score/async (202, job_id immediate)
- GET /api/score/jobs (list all jobs)
- GET /api/score/jobs/{id} (single job status)
Frontend:
- 评分记录 nav page with card list
- 5s auto-polling for queued/running jobs
- 查看报告 button navigates to existing 报告详情 page
Dify: change /api/score -> /api/score/async, no response parsing needed
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each API module now logs:
- evaluations: trigger (scenario path, task_id), status polls, list
- runs: list (count), detail (run_id, metrics, sample counts)
- scenarios: list (total, valid, error counts)
- pipeline: submit (docs_path, models, max_docs), status polls, list
- llm_profiles: CRUD ops (name, model, id), probe/test (model, ok, latency), apply (patched fields)
- score: already had per-request logging
Global middleware (webapp.access logger):
- Every API request: METHOD path -> status (latency_ms) at INFO
- Static file requests demoted to DEBUG to reduce noise
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Log incoming request (client, content-type, metrics, has_gt) on each /api/score call
- Log scoring result (latency, skipped metrics, scores) on success
- Register global RequestValidationError handler: logs url/content-type/errors
so 422 causes are visible in server log without checking HTTP response body
- Fix jsonable_encoder for exc.errors() to handle non-serializable ctx objects
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add detailed Chinese route docstring covering all 7 metrics, contexts format,
ground_truth optional behavior, and Bearer auth instructions
- Add 200 response content example for Swagger UI Try-it-out
- Bump app version to 0.3.0
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>