- New POST /api/score/session_async endpoint: same session_id calls append to one shared report
- New GET /api/score/sessions/{session_id}: returns call_count, metric_means, all job records
- New GET /api/score/session/jobs/{job_id}: individual call status
- SessionScoreJobManager: deterministic run_id from session_id, per-session mutex for CSV append, advisor regenerated on every call
- SessionScoreRequest (extends ScoreRequest + session_id), SessionScoreJobResponse, SessionStatus models added
- 24 new tests, all passing
chore(weighted-score): comment out 综合加权得分 display and computation
- report.js: hide 综合加权得分 card in report detail page
- score_jobs.js: hide 综合 chip in async job list
- report_builder.py: overall_ws=None (computation disabled)
- summary.py: weighted_score summary line disabled
- evaluator.py: weighted_score/sample_weight columns no longer written to scores.csv
- score.py /api/score: weighted_score always returns null
- score_job_manager.py + session_score_manager.py: weighted=None
- Updated 3 tests to match new behaviour (6 pre-existing failures unchanged)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add advisory_threshold=0.85 field to MetricRule (higher-is-better metrics)
- diagnose() now emits severity='low' for scores in (warning_threshold, 0.85)
- noise_sensitivity (lower-is-better) keeps its existing two-tier thresholds
- writer.py: severity labels mapped to Chinese (严重/警告/待优化)
- llm_analyzer.py: prompt explains low/warning/critical tiers in Chinese
- Tests: 5 new cases for 'low' severity, updated log summary assertions
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each async score job:
- Runs InlineScorer.score() in thread pool
- Writes standard run artifacts (metadata.json, scores.csv, summary.md)
- Runs optimization_advisor => optimization_advice.md
- Result appears in 运行列表 and 报告详情 with full report
New endpoints:
- POST /api/score/async (202, job_id immediate)
- GET /api/score/jobs (list all jobs)
- GET /api/score/jobs/{id} (single job status)
Frontend:
- 评分记录 nav page with card list
- 5s auto-polling for queued/running jobs
- 查看报告 button navigates to existing 报告详情 page
Dify: change /api/score -> /api/score/async, no response parsing needed
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When contexts is absent, metrics that require retrieved_contexts
(faithfulness, context_recall, context_precision, noise_sensitivity)
are automatically skipped and appear in skipped_metrics.
Only answer_relevancy, factual_correctness, semantic_similarity
remain computable without contexts.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gpt-5.4/5.5/5.2/5.4-mini/5.4-nano are incompatible with RAGAS 0.4.3
because they require max_completion_tokens instead of max_tokens.
gpt-5 / gpt-4.1 support max_tokens and json_object mode required by RAGAS.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
max_tokens=1 triggers 'min-output limit' errors on gpt-5.x models.
Using 8 tokens is still cheap but satisfies all known model minimums.
Falls back to max_completion_tokens=8 if max_tokens is not supported.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
text-embedding-* and other embedding models must call /embeddings not
/chat/completions. Added _is_embedding_model() heuristic that checks model
name keywords to route to the correct endpoint automatically.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Frontend test functionality was implemented but never committed to git.
Re-adds:
- profiles.js: testCard(), testForm(), _showTestResult(), test btn in renderCard
- api.js: testProfile(id) and probeConnectivity(body) methods
- index.html: 测试连通性 button + result div in profile form
- app.css: .btn-test and .profile-test-result styles
Backend /probe and /{id}/test endpoints were already present in llm_profiles.py.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
#view-apidocs has 'display: flex' in CSS which overrides the browser's
default '[hidden] { display: none }' user-agent style, causing the API
docs iframe to remain visible and bleed into the LLM config page.
Fix: add explicit '#view-apidocs[hidden] { display: none }' rule.
Also exclude apidocs from @media print to prevent iframe printing.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Newer setuptools (Linux) raises error when multiple top-level dirs are found.
Explicitly include only rag_eval/apps/webapp and exclude runtime data dirs.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Prevents CRLF line endings on .sh files which cause '/usr/bin/env: bash\r'
errors when running on Linux.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each API module now logs:
- evaluations: trigger (scenario path, task_id), status polls, list
- runs: list (count), detail (run_id, metrics, sample counts)
- scenarios: list (total, valid, error counts)
- pipeline: submit (docs_path, models, max_docs), status polls, list
- llm_profiles: CRUD ops (name, model, id), probe/test (model, ok, latency), apply (patched fields)
- score: already had per-request logging
Global middleware (webapp.access logger):
- Every API request: METHOD path -> status (latency_ms) at INFO
- Static file requests demoted to DEBUG to reduce noise
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Swagger UI Try it out was sending the {summary, value} wrapper as request body
instead of just the value contents, causing 422 errors. The 'example' (singular)
key is correctly used as the schema-level example by Swagger UI.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Log incoming request (client, content-type, metrics, has_gt) on each /api/score call
- Log scoring result (latency, skipped metrics, scores) on success
- Register global RequestValidationError handler: logs url/content-type/errors
so 422 causes are visible in server log without checking HTTP response body
- Fix jsonable_encoder for exc.errors() to handle non-serializable ctx objects
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add detailed Chinese route docstring covering all 7 metrics, contexts format,
ground_truth optional behavior, and Bearer auth instructions
- Add 200 response content example for Swagger UI Try-it-out
- Bump app version to 0.3.0
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- app.js: hash-based router (#runs / #new / #profiles / #report/{runId})
- navigate() pushes history entries for back/forward support
- _restoreSession() reads hash on load and popstate
- sessionStorage fallback for same-tab refreshes
- run-card highlights selected run (.run-card.selected)
- runner.js: use App.navigate() for report redirect; persist lastRunId to sessionStorage
- index.html: report nav button starts disabled (enabled on run select/restore)
- app.css: .run-card.selected with petrol border + ring
Co-Authored-By: Claude <noreply@anthropic.com>