Commit Graph

13 Commits

Author SHA1 Message Date
wangwei
b870ed8730 feat: make contexts optional in /api/score
When contexts is absent, metrics that require retrieved_contexts
(faithfulness, context_recall, context_precision, noise_sensitivity)
are automatically skipped and appear in skipped_metrics.
Only answer_relevancy, factual_correctness, semantic_similarity
remain computable without contexts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-24 14:42:03 +08:00
wangwei
a781ba1e4a config: set default judge_model=gpt-5, embedding_model=text-embedding-3-small
gpt-5.4/5.5/5.2/5.4-mini/5.4-nano are incompatible with RAGAS 0.4.3
because they require max_completion_tokens instead of max_tokens.
gpt-5 / gpt-4.1 support max_tokens and json_object mode required by RAGAS.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 15:29:01 +08:00
wangwei
2ad2c1ea9d docs: update /api/score example to use gpt-5.4 and text-embedding-3-small
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 15:11:34 +08:00
wangwei
f8e308b7dc fix: use max_tokens=8 for chat model connectivity test
max_tokens=1 triggers 'min-output limit' errors on gpt-5.x models.
Using 8 tokens is still cheap but satisfies all known model minimums.
Falls back to max_completion_tokens=8 if max_tokens is not supported.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 15:03:27 +08:00
wangwei
fb420656ec fix: use /embeddings endpoint for embedding models in connectivity test
text-embedding-* and other embedding models must call /embeddings not
/chat/completions. Added _is_embedding_model() heuristic that checks model
name keywords to route to the correct endpoint automatically.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 14:53:32 +08:00
wangwei
05419db1f9 fix: support max_completion_tokens for newer models (gpt-5.x) in connectivity test
Newer OpenAI models (gpt-5.4 etc.) reject max_tokens and require
max_completion_tokens. Try max_completion_tokens first, fall back to
max_tokens for older models / compatible APIs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 14:51:28 +08:00
wangwei
ac410e7a5d feat: add detailed logging to all API routes and global access log middleware
Each API module now logs:
- evaluations: trigger (scenario path, task_id), status polls, list
- runs: list (count), detail (run_id, metrics, sample counts)
- scenarios: list (total, valid, error counts)
- pipeline: submit (docs_path, models, max_docs), status polls, list
- llm_profiles: CRUD ops (name, model, id), probe/test (model, ok, latency), apply (patched fields)
- score: already had per-request logging

Global middleware (webapp.access logger):
- Every API request: METHOD path -> status (latency_ms) at INFO
- Static file requests demoted to DEBUG to reduce noise

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 10:35:00 +08:00
wangwei
5ced129ff7 feat: add detailed request logging to /api/score and global 422 handler
- Log incoming request (client, content-type, metrics, has_gt) on each /api/score call
- Log scoring result (latency, skipped metrics, scores) on success
- Register global RequestValidationError handler: logs url/content-type/errors
  so 422 causes are visible in server log without checking HTTP response body
- Fix jsonable_encoder for exc.errors() to handle non-serializable ctx objects

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 18:14:01 +08:00
wangwei
ebf1fc7be8 docs: enhance /api/score OpenAPI docs with full Chinese docstring and response example
- Add detailed Chinese route docstring covering all 7 metrics, contexts format,
  ground_truth optional behavior, and Bearer auth instructions
- Add 200 response content example for Swagger UI Try-it-out
- Bump app version to 0.3.0

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 15:52:30 +08:00
wangwei
a03a24be4e feat: add POST /api/score endpoint for Dify real-time scoring
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 15:14:19 +08:00
wangwei
ce0d2291b0 feat: yaml_patcher and ProfileApplyRequest support metric_weights and doc_weights
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-18 17:02:21 +08:00
wangwei
b19054bd66 feat: add /api/llm-profiles CRUD router 2026-06-16 16:18:40 +08:00
wangwei
e89695e490 Add RAGAS evaluation web console (FastAPI + vanilla JS)
- webapp/: FastAPI backend with runs/scenarios/evaluations API routers;
  services for run_reader, report_builder, scenario_scanner, task_manager
  (lazy ragas import — server boots even without ragas); Pydantic models
- webapp/static/: single-page console (layout A: left-nav + main area);
  report detail with metric cards, Chart.js distribution histogram,
  grouping table, lowest-score sample review; trigger evaluation + log polling
- webmain.py: uvicorn entry point (alongside existing main.py CLI)
- start.bat: Windows one-click launcher with env checks and auto-browser open
- rag_eval/datasets/: implement missing loader + normalizer modules
  (load_dataset_records, normalize_records) required by evaluator
- scripts/seed_sample_run.py: generate realistic demo run artifacts
- .gitignore: exclude datasets/ data files but keep rag_eval/datasets/ source

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
2026-06-15 15:53:57 +08:00