max_tokens=1 triggers 'min-output limit' errors on gpt-5.x models.
Using 8 tokens is still cheap but satisfies all known model minimums.
Falls back to max_completion_tokens=8 if max_tokens is not supported.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
text-embedding-* and other embedding models must call /embeddings not
/chat/completions. Added _is_embedding_model() heuristic that checks model
name keywords to route to the correct endpoint automatically.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each API module now logs:
- evaluations: trigger (scenario path, task_id), status polls, list
- runs: list (count), detail (run_id, metrics, sample counts)
- scenarios: list (total, valid, error counts)
- pipeline: submit (docs_path, models, max_docs), status polls, list
- llm_profiles: CRUD ops (name, model, id), probe/test (model, ok, latency), apply (patched fields)
- score: already had per-request logging
Global middleware (webapp.access logger):
- Every API request: METHOD path -> status (latency_ms) at INFO
- Static file requests demoted to DEBUG to reduce noise
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Log incoming request (client, content-type, metrics, has_gt) on each /api/score call
- Log scoring result (latency, skipped metrics, scores) on success
- Register global RequestValidationError handler: logs url/content-type/errors
so 422 causes are visible in server log without checking HTTP response body
- Fix jsonable_encoder for exc.errors() to handle non-serializable ctx objects
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add detailed Chinese route docstring covering all 7 metrics, contexts format,
ground_truth optional behavior, and Bearer auth instructions
- Add 200 response content example for Swagger UI Try-it-out
- Bump app version to 0.3.0
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>