Compare commits

...

14 Commits

Author SHA1 Message Date
wangwei
a781ba1e4a config: set default judge_model=gpt-5, embedding_model=text-embedding-3-small
gpt-5.4/5.5/5.2/5.4-mini/5.4-nano are incompatible with RAGAS 0.4.3
because they require max_completion_tokens instead of max_tokens.
gpt-5 / gpt-4.1 support max_tokens and json_object mode required by RAGAS.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 15:29:01 +08:00
wangwei
2ad2c1ea9d docs: update /api/score example to use gpt-5.4 and text-embedding-3-small
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 15:11:34 +08:00
wangwei
f8e308b7dc fix: use max_tokens=8 for chat model connectivity test
max_tokens=1 triggers 'min-output limit' errors on gpt-5.x models.
Using 8 tokens is still cheap but satisfies all known model minimums.
Falls back to max_completion_tokens=8 if max_tokens is not supported.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 15:03:27 +08:00
wangwei
fb420656ec fix: use /embeddings endpoint for embedding models in connectivity test
text-embedding-* and other embedding models must call /embeddings not
/chat/completions. Added _is_embedding_model() heuristic that checks model
name keywords to route to the correct endpoint automatically.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 14:53:32 +08:00
wangwei
05419db1f9 fix: support max_completion_tokens for newer models (gpt-5.x) in connectivity test
Newer OpenAI models (gpt-5.4 etc.) reject max_tokens and require
max_completion_tokens. Try max_completion_tokens first, fall back to
max_tokens for older models / compatible APIs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 14:51:28 +08:00
wangwei
1dc7ab9727 fix: restore LLM profile test connectivity buttons (lost from git)
Frontend test functionality was implemented but never committed to git.
Re-adds:
- profiles.js: testCard(), testForm(), _showTestResult(), test btn in renderCard
- api.js: testProfile(id) and probeConnectivity(body) methods
- index.html: 测试连通性 button + result div in profile form
- app.css: .btn-test and .profile-test-result styles

Backend /probe and /{id}/test endpoints were already present in llm_profiles.py.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 13:58:43 +08:00
wangwei
7cc3aff95a fix: hide #view-apidocs when [hidden] attribute is set
#view-apidocs has 'display: flex' in CSS which overrides the browser's
default '[hidden] { display: none }' user-agent style, causing the API
docs iframe to remain visible and bleed into the LLM config page.

Fix: add explicit '#view-apidocs[hidden] { display: none }' rule.
Also exclude apidocs from @media print to prevent iframe printing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 13:34:24 +08:00
wangwei
ad2651ce27 feat: configure full logging in webmain.py — all API logs to file + console
- RotatingFileHandler: logs/server_YYYY-MM-DD.log (50MB, keep 7 files)
- Console handler: colored timestamp + level + logger name + message
- webapp.* and rag_eval.* loggers captured at configured level
- uvicorn access/error logs also routed to same handlers
- File always captures DEBUG; console level controlled by --log-level arg
- Added --log-level and --log-file CLI arguments to webmain.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 11:46:34 +08:00
wangwei
fb42116616 fix: add setuptools package discovery config to pyproject.toml
Newer setuptools (Linux) raises error when multiple top-level dirs are found.
Explicitly include only rag_eval/apps/webapp and exclude runtime data dirs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 11:29:35 +08:00
wangwei
a629bd516c chore: add .gitattributes to enforce LF for shell scripts and Python files
Prevents CRLF line endings on .sh files which cause '/usr/bin/env: bash\r'
errors when running on Linux.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 11:22:24 +08:00
wangwei
ac410e7a5d feat: add detailed logging to all API routes and global access log middleware
Each API module now logs:
- evaluations: trigger (scenario path, task_id), status polls, list
- runs: list (count), detail (run_id, metrics, sample counts)
- scenarios: list (total, valid, error counts)
- pipeline: submit (docs_path, models, max_docs), status polls, list
- llm_profiles: CRUD ops (name, model, id), probe/test (model, ok, latency), apply (patched fields)
- score: already had per-request logging

Global middleware (webapp.access logger):
- Every API request: METHOD path -> status (latency_ms) at INFO
- Static file requests demoted to DEBUG to reduce noise

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 10:35:00 +08:00
wangwei
1304fec1c4 fix: change ScoreRequest json_schema_extra from examples list to example dict
Swagger UI Try it out was sending the {summary, value} wrapper as request body
instead of just the value contents, causing 422 errors. The 'example' (singular)
key is correctly used as the schema-level example by Swagger UI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-23 10:03:46 +08:00
wangwei
5ced129ff7 feat: add detailed request logging to /api/score and global 422 handler
- Log incoming request (client, content-type, metrics, has_gt) on each /api/score call
- Log scoring result (latency, skipped metrics, scores) on success
- Register global RequestValidationError handler: logs url/content-type/errors
  so 422 causes are visible in server log without checking HTTP response body
- Fix jsonable_encoder for exc.errors() to handle non-serializable ctx objects

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 18:14:01 +08:00
wangwei
ebf1fc7be8 docs: enhance /api/score OpenAPI docs with full Chinese docstring and response example
- Add detailed Chinese route docstring covering all 7 metrics, contexts format,
  ground_truth optional behavior, and Bearer auth instructions
- Add 200 response content example for Swagger UI Try-it-out
- Bump app version to 0.3.0

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-06-22 15:52:30 +08:00
17 changed files with 602 additions and 41 deletions

View File

@@ -8,8 +8,10 @@ OPENAI_BASE_URL=http://6.86.80.4:30080/v1
OPENAI_TIMEOUT_SECONDS=180
# 默认评测模型(可在场景 YAML 或 Web 控制台 LLM 配置中覆盖)
RAGAS_JUDGE_MODEL=deepseek-v4-flash
RAGAS_EMBEDDING_MODEL=text-embedding-v3
# RAGAS_JUDGE_MODEL 需支持 max_tokens + json_objectgpt-5、gpt-4.1、gpt-4o 等)
# 注意gpt-5.4/5.5/5.2 系列不支持 max_tokens与 RAGAS 0.4.3 不兼容
RAGAS_JUDGE_MODEL=gpt-5
RAGAS_EMBEDDING_MODEL=text-embedding-3-small
# 评估并发控制(启用 7 个指标时建议 RAGAS_METRIC_TIMEOUT_SECONDS=300
BATCH_SIZE=8

26
.gitattributes vendored Normal file
View File

@@ -0,0 +1,26 @@
# 默认:文本文件使用 LFLinux/macOS 风格)
* text=auto eol=lf
# Shell 脚本强制 LF无论在哪个平台 checkout
*.sh text eol=lf
# Python 和 YAML 也用 LF
*.py text eol=lf
*.yaml text eol=lf
*.yml text eol=lf
*.md text eol=lf
*.json text eol=lf
*.toml text eol=lf
*.txt text eol=lf
*.env text eol=lf
*.env.example text eol=lf
# Windows 脚本保留 CRLF
*.ps1 text eol=crlf
*.bat text eol=crlf
# 二进制文件不转换
*.pdf binary
*.png binary
*.jpg binary
*.csv binary

View File

@@ -17,3 +17,8 @@ dependencies = [
"pydantic-settings>=2.14.1",
"ragas==0.4.3",
]
[tool.setuptools.packages.find]
# 只打包源码目录,排除运行时产生的数据目录
include = ["rag_eval*", "apps*", "webapp*"]
exclude = ["logs*", "outputs*", "datasets*", "configs*", "scenarios*", "scripts*", "tests*"]

View File

@@ -21,9 +21,9 @@ class EvaluationSettings(BaseSettings):
openai_api_key: str | None = Field(default=None, alias="OPENAI_API_KEY")
openai_base_url: str = Field(default="http://6.86.80.4:30080/v1", alias="OPENAI_BASE_URL")
ragas_judge_model: str = Field(default="deepseek-v4-flash", alias="RAGAS_JUDGE_MODEL")
ragas_judge_model: str = Field(default="gpt-5", alias="RAGAS_JUDGE_MODEL")
ragas_embedding_model: str = Field(
default="text-embedding-v3",
default="text-embedding-3-small",
alias="RAGAS_EMBEDDING_MODEL",
)
openai_timeout_seconds: float = Field(default=30.0, alias="OPENAI_TIMEOUT_SECONDS")

View File

@@ -2,6 +2,8 @@
from __future__ import annotations
import logging
from fastapi import APIRouter, HTTPException
from webapp.models import (
@@ -13,19 +15,23 @@ from webapp.services import scenario_scanner
from webapp.services.task_manager import task_manager
router = APIRouter(prefix="/api/evaluations", tags=["evaluations"])
logger = logging.getLogger("webapp.api.evaluations")
@router.post("", response_model=TriggerEvaluationResponse)
def trigger_evaluation(request: TriggerEvaluationRequest) -> TriggerEvaluationResponse:
"""Validate the scenario path and queue a background evaluation task."""
logger.info("[trigger] scenario=%s", request.scenario_path)
resolved = scenario_scanner.resolve_scenario_path(request.scenario_path)
if resolved is None:
logger.warning("[trigger] invalid scenario path: %s", request.scenario_path)
raise HTTPException(
status_code=400,
detail=f"无效或不允许的场景路径: {request.scenario_path}",
)
task_id = task_manager.submit(request.scenario_path)
logger.info("[trigger] queued task_id=%s scenario=%s", task_id, request.scenario_path)
return TriggerEvaluationResponse(task_id=task_id)
@@ -34,11 +40,15 @@ def get_task_status(task_id: str) -> TaskStatus:
"""Return the current status and logs for one evaluation task."""
status = task_manager.get(task_id)
if status is None:
logger.warning("[task_status] not found task_id=%s", task_id)
raise HTTPException(status_code=404, detail=f"未找到任务: {task_id}")
logger.debug("[task_status] task_id=%s status=%s", task_id, status.status)
return status
@router.get("", response_model=dict)
def list_tasks() -> dict[str, list]:
"""Return all known evaluation tasks for this server session."""
return {"tasks": [task.model_dump() for task in task_manager.list_tasks()]}
tasks = task_manager.list_tasks()
logger.info("[list_tasks] count=%d", len(tasks))
return {"tasks": [task.model_dump() for task in tasks]}

View File

@@ -2,6 +2,7 @@
from __future__ import annotations
import logging
import time
from fastapi import APIRouter, HTTPException
@@ -19,6 +20,19 @@ from webapp.services.profile_manager import profile_manager
from webapp.services.yaml_patcher import apply_profiles_to_scenario
router = APIRouter(prefix="/api/llm-profiles", tags=["llm-profiles"])
logger = logging.getLogger("webapp.api.llm_profiles")
# 常见 embedding 模型名称关键词,用于自动判断走 /embeddings 端点
_EMBEDDING_MODEL_KEYWORDS = (
"embedding", "embed", "text-search", "text-similarity",
"code-search", "ada-002",
)
def _is_embedding_model(model: str) -> bool:
"""Heuristic: return True if the model name looks like an embedding model."""
return any(kw in model.lower() for kw in _EMBEDDING_MODEL_KEYWORDS)
def _do_connectivity_test(
@@ -27,58 +41,102 @@ def _do_connectivity_test(
api_key: str,
timeout_seconds: int,
) -> ProfileTestResponse:
"""Send a minimal chat completion request and return the test result."""
"""Send a minimal request and return the connectivity test result.
- Embedding models → POST /embeddings with a short text
- Chat models → POST /chat/completions, tries max_completion_tokens first
(required by newer models like gpt-5.x), falls back to max_tokens.
"""
client = OpenAI(
api_key=api_key,
base_url=base_url.rstrip("/"),
timeout=float(timeout_seconds),
)
t0 = time.monotonic()
if _is_embedding_model(model):
# Embedding 模型走 /embeddings 端点
try:
client.embeddings.create(model=model, input="test")
latency_ms = int((time.monotonic() - t0) * 1000)
return ProfileTestResponse(ok=True, message="连接成功embedding", latency_ms=latency_ms)
except Exception as exc: # noqa: BLE001
latency_ms = int((time.monotonic() - t0) * 1000)
return ProfileTestResponse(ok=False, message=str(exc), latency_ms=latency_ms)
# Chat 模型:先不限制 token最兼容超时/鉴权错误直接返回
# 避免 max_tokens=1 对部分模型gpt-5.x触发 min-output 限制
try:
client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "hi"}],
max_tokens=1,
max_tokens=8, # 足够小节省费用,同时满足各模型最小输出要求
)
latency_ms = int((time.monotonic() - t0) * 1000)
return ProfileTestResponse(ok=True, message="连接成功", latency_ms=latency_ms)
except Exception as exc: # noqa: BLE001
err_str = str(exc)
# 如果 max_tokens 不被支持,改用 max_completion_tokens 再试一次
if "max_tokens" in err_str and "max_completion_tokens" in err_str:
try:
client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "hi"}],
max_completion_tokens=8,
)
latency_ms = int((time.monotonic() - t0) * 1000)
return ProfileTestResponse(ok=False, message=str(exc), latency_ms=latency_ms)
return ProfileTestResponse(ok=True, message="连接成功", latency_ms=latency_ms)
except Exception as exc2: # noqa: BLE001
latency_ms = int((time.monotonic() - t0) * 1000)
return ProfileTestResponse(ok=False, message=str(exc2), latency_ms=latency_ms)
latency_ms = int((time.monotonic() - t0) * 1000)
return ProfileTestResponse(ok=False, message=err_str, latency_ms=latency_ms)
latency_ms = int((time.monotonic() - t0) * 1000)
return ProfileTestResponse(ok=False, message="连接测试失败", latency_ms=latency_ms)
@router.post("/probe", response_model=ProfileTestResponse, tags=["llm-profiles"])
def probe_connectivity(request: ProfileProbeRequest) -> ProfileTestResponse:
"""Test LLM connectivity with inline credentials (no saved profile required)."""
return _do_connectivity_test(
logger.info("[probe] model=%s base_url=%s", request.model, request.base_url)
result = _do_connectivity_test(
model=request.model,
base_url=request.base_url,
api_key=request.api_key,
timeout_seconds=request.timeout_seconds,
)
logger.info("[probe] ok=%s latency=%sms msg=%s", result.ok, result.latency_ms, result.message)
return result
@router.get("", response_model=dict)
def list_profiles() -> dict:
"""Return all saved LLM profiles."""
return {"profiles": [p.model_dump() for p in profile_manager.list_all()]}
profiles = profile_manager.list_all()
logger.info("[list_profiles] count=%d", len(profiles))
return {"profiles": [p.model_dump() for p in profiles]}
@router.post("", status_code=201, response_model=LLMProfile)
def create_profile(request: CreateProfileRequest) -> LLMProfile:
"""Create a new LLM profile."""
return profile_manager.create(
logger.info("[create_profile] name=%r model=%s base_url=%s", request.name, request.model, request.base_url)
profile = profile_manager.create(
name=request.name,
model=request.model,
base_url=request.base_url,
api_key=request.api_key,
timeout_seconds=request.timeout_seconds,
)
logger.info("[create_profile] created id=%s", profile.profile_id)
return profile
@router.put("/{profile_id}", response_model=LLMProfile)
def update_profile(profile_id: str, request: CreateProfileRequest) -> LLMProfile:
"""Update an existing LLM profile by id."""
logger.info("[update_profile] id=%s name=%r model=%s", profile_id, request.name, request.model)
updated = profile_manager.update(
profile_id=profile_id,
name=request.name,
@@ -88,16 +146,21 @@ def update_profile(profile_id: str, request: CreateProfileRequest) -> LLMProfile
timeout_seconds=request.timeout_seconds,
)
if updated is None:
logger.warning("[update_profile] not found id=%s", profile_id)
raise HTTPException(status_code=404, detail=f"Profile not found: {profile_id}")
logger.info("[update_profile] updated id=%s", profile_id)
return updated
@router.delete("/{profile_id}", response_model=dict)
def delete_profile(profile_id: str) -> dict:
"""Delete an LLM profile by id."""
logger.info("[delete_profile] id=%s", profile_id)
deleted = profile_manager.delete(profile_id)
if not deleted:
logger.warning("[delete_profile] not found id=%s", profile_id)
raise HTTPException(status_code=404, detail=f"Profile not found: {profile_id}")
logger.info("[delete_profile] deleted id=%s", profile_id)
return {"deleted": True}
@@ -106,18 +169,31 @@ def test_profile(profile_id: str) -> ProfileTestResponse:
"""Test LLM connectivity for a saved profile."""
profile = profile_manager.get(profile_id)
if profile is None:
logger.warning("[test_profile] not found id=%s", profile_id)
raise HTTPException(status_code=404, detail=f"Profile not found: {profile_id}")
return _do_connectivity_test(
logger.info("[test_profile] id=%s model=%s base_url=%s", profile_id, profile.model, profile.base_url)
result = _do_connectivity_test(
model=profile.model,
base_url=profile.base_url,
api_key=profile.api_key,
timeout_seconds=profile.timeout_seconds,
)
logger.info("[test_profile] ok=%s latency=%sms", result.ok, result.latency_ms)
return result
@router.post("/apply", response_model=ProfileApplyResponse)
def apply_profiles(request: ProfileApplyRequest) -> ProfileApplyResponse:
"""Patch selected LLM profiles into the target scenario YAML file."""
logger.info(
"[apply_profiles] scenario=%s judge=%s answer=%s dataset=%s metric_weights=%s doc_weights=%s",
request.scenario_path,
request.judge_profile_id,
request.answer_profile_id,
request.dataset_profile_id,
bool(request.metric_weights),
bool(request.doc_weights),
)
role_profiles: dict[str, LLMProfile | None] = {
"judge": profile_manager.get(request.judge_profile_id) if request.judge_profile_id else None,
"answer": profile_manager.get(request.answer_profile_id) if request.answer_profile_id else None,
@@ -135,6 +211,7 @@ def apply_profiles(request: ProfileApplyRequest) -> ProfileApplyResponse:
]
if missing:
logger.warning("[apply_profiles] missing profiles for roles: %s", missing)
raise HTTPException(
status_code=400,
detail=f"Profile(s) not found for roles: {', '.join(missing)}",
@@ -148,6 +225,7 @@ def apply_profiles(request: ProfileApplyRequest) -> ProfileApplyResponse:
metric_weights=request.metric_weights,
doc_weights=request.doc_weights,
)
logger.info("[apply_profiles] patched fields: %s", patched)
return ProfileApplyResponse(
scenario_path=request.scenario_path,
patched_fields=patched,

131
webapp/api/pipeline.py Normal file
View File

@@ -0,0 +1,131 @@
"""Routes for the end-to-end pipeline API (document parse → build → eval)."""
from __future__ import annotations
import logging
from fastapi import APIRouter, HTTPException
from webapp.models import (
PipelineJobRequest,
PipelineJobResponse,
PipelineJobStatus,
)
from webapp.services.pipeline_task_manager import pipeline_task_manager
router = APIRouter(prefix="/api/pipeline", tags=["pipeline"])
logger = logging.getLogger("webapp.api.pipeline")
@router.post(
"/jobs",
status_code=202,
response_model=PipelineJobResponse,
summary="提交全链路评估任务",
responses={
202: {
"description": "任务已成功排队,立即返回 job_id。",
"content": {
"application/json": {
"example": {
"job_id": "a1b2c3d4e5f6",
"job_name": "siemens-ct-eval-2026",
"status": "queued",
}
}
},
},
422: {"description": "请求参数校验失败docs_path 等必填字段缺失或格式错误)。"},
},
)
def submit_pipeline_job(request: PipelineJobRequest) -> PipelineJobResponse:
"""提交一个「解析文档 → 生成题库 → RAGAS 评估 → 输出报告」全链路任务。
任务在后台线程中异步执行,立即返回 `job_id`。
通过 `GET /api/pipeline/jobs/{job_id}` 轮询 `status` / `phase` / `logs`。
**Pipeline 执行阶段**
1. `parsing_documents` — 调用阿里云 DocMind 解析每份 PDF
2. `generating_questions` — LLM 从文档片段生成草稿题库
3. `evaluating` — RAGAS 在线评测打分answer_model 答题 + judge_model 评分)
4. `done` — 所有产物写入磁盘,`status` 变为 `completed`
"""
logger.info(
"[submit_pipeline] docs_path=%s job_name=%r gen_model=%s judge=%s max_docs=%s",
request.docs_path, request.job_name, request.generation_model,
request.judge_model, request.max_documents,
)
task = pipeline_task_manager.submit(request)
logger.info("[submit_pipeline] queued job_id=%s job_name=%s", task.job_id, task.job_name)
return PipelineJobResponse(
job_id=task.job_id,
job_name=task.job_name,
status=task.status,
)
@router.get(
"/jobs/{job_id}",
response_model=PipelineJobStatus,
summary="查询任务状态",
responses={
200: {"description": "返回任务当前状态、执行阶段、日志及完成后的产物路径。"},
404: {"description": "指定 job_id 的任务不存在。"},
},
)
def get_pipeline_job(job_id: str) -> PipelineJobStatus:
"""查询一个 Pipeline 任务的当前状态、执行阶段、实时日志和结果。
**轮询建议**:每 35 秒查询一次,直到 `status` 为 `completed` 或 `failed`。
`result` 字段在任务完成后填充,包含:
- `scores_csv` — 每道题目逐项评分
- `summary_md` — 评估摘要 Markdown
- `dataset_csv` — 生成的题库 CSV
- `source_chunks_jsonl` — 文档片段索引
"""
status = pipeline_task_manager.get(job_id)
if status is None:
logger.warning("[get_pipeline_job] not found job_id=%s", job_id)
raise HTTPException(status_code=404, detail=f"Pipeline job not found: {job_id}")
logger.debug("[get_pipeline_job] job_id=%s status=%s phase=%s", job_id, status.status, status.phase)
return status
@router.get(
"/jobs",
response_model=dict,
summary="列出所有任务",
responses={
200: {
"description": "按创建时间倒序返回本次服务器会话中所有的 Pipeline 任务。",
"content": {
"application/json": {
"example": {
"jobs": [
{
"job_id": "a1b2c3d4e5f6",
"job_name": "siemens-ct-eval",
"status": "completed",
"phase": "done",
"logs": ["[build] 17 documents parsed", "..."],
"result": {
"total_questions": 19,
"eval_run_id": "2026-06-18T...",
"scores_csv": "outputs/pipeline/.../scores.csv",
"summary_md": "outputs/pipeline/.../summary.md",
},
"error": None,
}
]
}
}
},
}
},
)
def list_pipeline_jobs() -> dict:
"""返回本次服务器会话中所有已提交的 Pipeline 任务,按创建时间倒序排列。"""
jobs = pipeline_task_manager.list_jobs()
logger.info("[list_pipeline_jobs] count=%d", len(jobs))
return {"jobs": [s.model_dump() for s in jobs]}

View File

@@ -2,31 +2,42 @@
from __future__ import annotations
import logging
from fastapi import APIRouter, HTTPException
from webapp.models import RunDetail
from webapp.services import report_builder, run_reader
router = APIRouter(prefix="/api/runs", tags=["runs"])
logger = logging.getLogger("webapp.api.runs")
@router.get("")
def get_runs() -> dict[str, list]:
"""Return summaries for every discoverable evaluation run."""
summaries = run_reader.list_run_summaries()
logger.info("[get_runs] found %d runs", len(summaries))
return {"runs": [summary.model_dump() for summary in summaries]}
@router.get("/{run_id}")
def get_run_detail(run_id: str) -> RunDetail:
"""Return the full summary and aggregated report for one run."""
logger.info("[get_run_detail] run_id=%s", run_id)
run_dir = run_reader.find_run_dir(run_id)
if run_dir is None:
logger.warning("[get_run_detail] not found run_id=%s", run_id)
raise HTTPException(status_code=404, detail=f"未找到运行: {run_id}")
summary = run_reader.build_run_summary(run_dir)
if summary is None:
logger.warning("[get_run_detail] missing metadata run_id=%s", run_id)
raise HTTPException(status_code=404, detail=f"运行元数据缺失: {run_id}")
report = report_builder.build_report(run_dir, summary.metrics)
logger.info(
"[get_run_detail] ok run_id=%s metrics=%s valid=%d invalid=%d",
run_id, summary.metrics, summary.valid_samples, summary.invalid_samples,
)
return RunDetail(summary=summary, report=report)

View File

@@ -2,15 +2,20 @@
from __future__ import annotations
import logging
from fastapi import APIRouter
from webapp.services import scenario_scanner
router = APIRouter(prefix="/api/scenarios", tags=["scenarios"])
logger = logging.getLogger("webapp.api.scenarios")
@router.get("")
def get_scenarios() -> dict[str, list]:
"""Return every scenario file found under the scenarios/ directory."""
scenarios = scenario_scanner.list_scenarios()
valid = sum(1 for s in scenarios if not s.error)
logger.info("[get_scenarios] total=%d valid=%d errors=%d", len(scenarios), valid, len(scenarios) - valid)
return {"scenarios": [item.model_dump() for item in scenarios]}

View File

@@ -2,10 +2,13 @@
from __future__ import annotations
import logging
import time
from typing import Annotated
from fastapi import APIRouter, Header, HTTPException
from fastapi import APIRouter, Header, HTTPException, Request
from fastapi.exceptions import RequestValidationError
from fastapi.responses import JSONResponse
from rag_eval.metrics.weights import compute_weighted_score
from rag_eval.settings import EvaluationSettings
@@ -13,6 +16,7 @@ from webapp.models import ScoreRequest, ScoreResponse
from webapp.services.inline_scorer import inline_scorer
router = APIRouter(prefix="/api/score", tags=["score"])
logger = logging.getLogger("webapp.api.score")
def _get_settings() -> EvaluationSettings:
@@ -34,16 +38,74 @@ def _check_auth(authorization: str | None, token: str) -> None:
response_model=ScoreResponse,
summary="单题实时评分Dify 外部 Tool",
responses={
200: {"description": "各指标得分和加权综合得分。"},
200: {
"description": "各指标得分、加权综合得分及耗时。",
"content": {
"application/json": {
"example": {
"scores": {
"faithfulness": 0.875,
"answer_relevancy": 0.920,
"context_recall": 0.810,
"context_precision": 0.850,
},
"weighted_score": 0.8638,
"latency_ms": 3420,
"skipped_metrics": [],
"error": None,
}
}
},
},
401: {"description": "配置了 SCORE_API_TOKEN 但未提供有效 Bearer token。"},
422: {"description": "请求参数校验失败。"},
422: {"description": "请求参数校验失败(必填字段缺失或 metrics 名称不合法)"},
},
)
def score_sample(
raw_request: Request,
request: ScoreRequest,
authorization: Annotated[str | None, Header()] = None,
) -> ScoreResponse:
"""Accept one QA sample, run RAGAS metrics synchronously, and return scores."""
"""接受单条问答记录,同步运行 RAGAS 指标打分,实时返回各指标得分。
**主要用途**:供 Dify 外部 Tool 调用。Dify Agent 在生成回答后,将
`(question, answer, contexts)` 发送到此端点,即可获得 RAGAS 质量评分,
用于日志记录、质量监控或触发 Agent 自我改进流程。
**contexts 格式**:多个检索片段用 `context_separator`(默认 `" |||| "`)拼接为一个字符串,
服务端自动拆分后传入 RAGAS 管道。
**ground_truth 可选**
- 提供时:所有指定指标均参与计算。
- 缺失时:自动跳过依赖参考答案的指标(`context_recall`、
`factual_correctness`、`semantic_similarity`、`noise_sensitivity`
跳过的指标在响应的 `skipped_metrics` 列表中列出,对应 `scores` 值为 `null`。
**支持的 RAGAS 指标**
- `faithfulness` — 回答与检索片段的事实一致性
- `answer_relevancy` — 回答与问题的相关性
- `context_recall` — 参考答案覆盖到的检索内容比例(需 ground_truth
- `context_precision` — 检索片段中与答案相关的部分占比
- `noise_sensitivity` — 对无关噪声片段的敏感度(需 ground_truth
- `factual_correctness` — 回答与参考答案的事实准确性(需 ground_truth
- `semantic_similarity` — 回答与参考答案的语义相似度(需 ground_truth
**推荐模型配置**
- `judge_model`: `gpt-5`
- `embedding_model`: `text-embedding-3-small`
**鉴权**:若 `.env` 中配置了 `SCORE_API_TOKEN`,需在请求头携带
`Authorization: Bearer <token>`;留空则无需鉴权(适合内网部署)。
"""
client = f"{raw_request.client.host}:{raw_request.client.port}" if raw_request.client else "unknown"
logger.info(
"[score] incoming client=%s method=%s content_type=%s metrics=%s has_gt=%s",
client,
raw_request.method,
raw_request.headers.get("content-type", ""),
request.metrics,
request.ground_truth is not None,
)
settings = _get_settings()
# Require Bearer auth only when the deployment configured a shared token.
@@ -97,6 +159,12 @@ def score_sample(
{},
)
logger.info(
"[score] done latency=%dms skipped=%s scores=%s",
latency_ms,
skipped,
{k: (round(v, 4) if v is not None else None) for k, v in all_scores.items()},
)
return ScoreResponse(
scores=all_scores,
weighted_score=round(weighted, 4) if weighted is not None else None,

View File

@@ -408,10 +408,7 @@ class ScoreRequest(BaseModel):
model_config = ConfigDict(
json_schema_extra={
"examples": [
{
"summary": "基础评分请求",
"value": {
"example": {
"question": "双源CT的时间分辨率是多少?",
"answer": "双源CT的单扇区时间分辨率为75ms。",
"contexts": "双源CT采用两套管-探测器系统 |||| 单扇区采集旋转135度",
@@ -423,11 +420,9 @@ class ScoreRequest(BaseModel):
"context_recall",
"context_precision",
],
"judge_model": "deepseek-v4-flash",
"embedding_model": "text-embedding-v3",
},
"judge_model": "gpt-5",
"embedding_model": "text-embedding-3-small",
}
]
}
)

View File

@@ -7,15 +7,21 @@ the server starts even when the evaluation dependencies are not yet installed.
from __future__ import annotations
import logging
import time
from pathlib import Path
from fastapi import FastAPI
from fastapi.responses import FileResponse
from fastapi import FastAPI, Request
from fastapi.encoders import jsonable_encoder
from fastapi.exceptions import RequestValidationError
from fastapi.responses import FileResponse, JSONResponse
from fastapi.staticfiles import StaticFiles
from webapp.api import evaluations, llm_profiles, pipeline, runs, scenarios, score
STATIC_DIR = Path(__file__).resolve().parent / "static"
logger = logging.getLogger("webapp.server")
access_logger = logging.getLogger("webapp.access")
# OpenAPI tag metadata — controls the grouping and descriptions in /docs.
OPENAPI_TAGS = [
@@ -92,7 +98,7 @@ def create_app() -> FastAPI:
"- **报告 API** — 查询历史运行记录与评估报告\n\n"
"> **快速开始**:调用 `POST /api/pipeline/jobs` 传入 PDF 文件夹路径即可启动完整评估流程。"
),
version="0.2.0",
version="0.3.0",
openapi_tags=OPENAPI_TAGS,
)
@@ -103,6 +109,39 @@ def create_app() -> FastAPI:
app.include_router(pipeline.router)
app.include_router(score.router)
@app.middleware("http")
async def access_log_middleware(request: Request, call_next):
"""Log every API request with method, path, status code and latency.
Static file requests are logged at DEBUG level to keep the console clean.
"""
t0 = time.monotonic()
response = await call_next(request)
latency_ms = int((time.monotonic() - t0) * 1000)
path = request.url.path
is_static = path.startswith("/static/") or path in ("/", "/favicon.ico")
msg = "%s %s%d (%dms)", request.method, path, response.status_code, latency_ms
if is_static:
access_logger.debug(*msg)
else:
access_logger.info(*msg)
return response
@app.exception_handler(RequestValidationError)
async def validation_exception_handler(request: Request, exc: RequestValidationError) -> JSONResponse:
"""Log full validation error detail to help diagnose 422 responses."""
errors = jsonable_encoder(exc.errors())
logger.warning(
"[422] validation error url=%s content_type=%s errors=%s",
request.url.path,
request.headers.get("content-type", ""),
errors,
)
return JSONResponse(
status_code=422,
content={"detail": errors},
)
@app.get("/api/health", tags=["meta"])
def health() -> dict[str, str]:
"""Report basic liveness so the UI can confirm the server is reachable."""

View File

@@ -294,6 +294,21 @@ table.group-table td { border-bottom: 1px solid #f1f5f9; font-variant-numeric: t
.btn-sm { padding: 4px 10px; font-size: 12px; }
.btn-danger { color: var(--bad); border-color: var(--bad); }
.btn-danger:hover { background: #fee2e2; }
.btn-test { color: #0369a1; border-color: #0369a1; }
.btn-test:hover { background: #e0f2fe; }
/* LLM 连通性测试结果 */
.profile-test-result {
margin-top: 8px;
padding: 6px 10px;
border-radius: 6px;
font-size: 12px;
font-weight: 500;
display: none;
}
.profile-test-result:not([hidden]) { display: block; }
.profile-test-result.ok { background: #dcfce7; color: #166534; border: 1px solid #bbf7d0; }
.profile-test-result.fail { background: #fee2e2; color: #991b1b; border: 1px solid #fecaca; word-break: break-all; }
/* 选中态 run 卡片 */
.run-card.selected {
@@ -310,6 +325,7 @@ table.group-table td { border-bottom: 1px solid #f1f5f9; font-variant-numeric: t
/* ---------- API 文档 iframe ---------- */
#view-apidocs { padding: 0; display: flex; flex-direction: column; flex: 1; }
#view-apidocs[hidden] { display: none; }
.apidocs-frame {
flex: 1;
width: 100%;
@@ -404,6 +420,7 @@ table.group-table td { border-bottom: 1px solid #f1f5f9; font-variant-numeric: t
.app { display: block; }
.main { display: block; width: 100%; }
.view { padding: 0; display: block !important; }
#view-apidocs { display: none !important; } /* never print the API docs iframe */
#view-report { display: block !important; }
/* ── 报告内容 ── */

View File

@@ -219,9 +219,11 @@
</div>
<div class="form-actions">
<button class="btn btn-primary" id="save-profile-btn">保存</button>
<button class="btn btn-test" id="test-profile-btn">测试连通性</button>
<button class="btn" id="cancel-profile-btn">取消</button>
<span class="form-error muted" id="profile-form-error"></span>
</div>
<div class="profile-test-result" id="profile-form-test-result" hidden></div>
</div>
</div>

View File

@@ -65,4 +65,16 @@ const API = {
});
},
applyProfiles(body) { return API.post("/api/llm-profiles/apply", body); },
// 测试已保存 profile 的连通性
testProfile(id) {
return fetch(`/api/llm-profiles/${encodeURIComponent(id)}/test`, { method: "POST" })
.then(async r => {
if (!r.ok) { const d = await API._extractError(r); throw new Error(d); }
return r.json();
});
},
// 测试表单中填写的内联参数(保存前即可测试)
probeConnectivity(body) { return API.post("/api/llm-profiles/probe", body); },
};

View File

@@ -8,6 +8,7 @@ const Profiles = {
document.getElementById("add-profile-btn").addEventListener("click", () => Profiles.showForm());
document.getElementById("save-profile-btn").addEventListener("click", () => Profiles.save());
document.getElementById("cancel-profile-btn").addEventListener("click", () => Profiles.hideForm());
document.getElementById("test-profile-btn").addEventListener("click", () => Profiles.testForm());
},
// 加载并渲染 Profile 列表
@@ -39,6 +40,7 @@ const Profiles = {
<div class="profile-card-head">
<div class="profile-card-name">${App.escape(p.name)}</div>
<div class="profile-card-actions">
<button class="btn btn-sm btn-test" data-action="test">测试</button>
<button class="btn btn-sm" data-action="edit">编辑</button>
<button class="btn btn-sm btn-danger" data-action="delete">删除</button>
</div>
@@ -46,12 +48,72 @@ const Profiles = {
<div class="profile-card-field"><span class="field-label">模型</span> <code>${App.escape(p.model)}</code></div>
<div class="profile-card-field"><span class="field-label">Base URL</span> <code>${App.escape(p.base_url)}</code></div>
<div class="profile-card-field"><span class="field-label">超时</span> ${p.timeout_seconds}s</div>
<div class="profile-test-result" data-result hidden></div>
`;
card.querySelector("[data-action=test]").addEventListener("click", () => Profiles.testCard(p, card));
card.querySelector("[data-action=edit]").addEventListener("click", () => Profiles.showForm(p));
card.querySelector("[data-action=delete]").addEventListener("click", () => Profiles.remove(p.profile_id, p.name));
return card;
},
// 测试已保存的 profile卡片上的测试按钮
async testCard(p, card) {
const btn = card.querySelector("[data-action=test]");
const resultEl = card.querySelector("[data-result]");
btn.disabled = true;
btn.textContent = "测试中…";
resultEl.hidden = true;
resultEl.className = "profile-test-result";
try {
const res = await API.testProfile(p.profile_id);
Profiles._showTestResult(resultEl, res);
} catch (err) {
Profiles._showTestResult(resultEl, { ok: false, message: err.message });
} finally {
btn.disabled = false;
btn.textContent = "测试";
}
},
// 测试表单中当前填写的参数(保存前即可测试)
async testForm() {
const body = {
model: document.getElementById("pf-model").value.trim(),
base_url: document.getElementById("pf-base-url").value.trim(),
api_key: document.getElementById("pf-api-key").value.trim(),
timeout_seconds: parseInt(document.getElementById("pf-timeout").value, 10) || 30,
};
const errEl = document.getElementById("profile-form-error");
if (!body.model || !body.base_url || !body.api_key) {
errEl.textContent = "请先填写模型名称、Base URL 和 API Key";
return;
}
errEl.textContent = "";
const testBtn = document.getElementById("test-profile-btn");
const resultEl = document.getElementById("profile-form-test-result");
testBtn.disabled = true;
testBtn.textContent = "测试中…";
resultEl.hidden = true;
resultEl.className = "profile-test-result";
try {
const res = await API.probeConnectivity(body);
Profiles._showTestResult(resultEl, res);
} catch (err) {
Profiles._showTestResult(resultEl, { ok: false, message: err.message });
} finally {
testBtn.disabled = false;
testBtn.textContent = "测试连通性";
}
},
// 渲染测试结果到指定元素
_showTestResult(el, res) {
el.hidden = false;
el.classList.add(res.ok ? "ok" : "fail");
const latency = res.latency_ms != null ? ` (${res.latency_ms}ms)` : "";
el.textContent = res.ok ? `✓ 连接成功${latency}` : `${res.message}`;
},
// 显示新建或编辑表单
showForm(profile = null) {
const panel = document.getElementById("profile-form-panel");
@@ -65,6 +127,9 @@ const Profiles = {
document.getElementById("pf-api-key").value = profile ? profile.api_key : "";
document.getElementById("pf-timeout").value = profile ? profile.timeout_seconds : 30;
document.getElementById("profile-form-error").textContent = "";
const resultEl = document.getElementById("profile-form-test-result");
resultEl.hidden = true;
resultEl.className = "profile-test-result";
panel.scrollIntoView({ behavior: "smooth", block: "start" });
},

View File

@@ -5,13 +5,73 @@ and the same runs/ artifacts. Example:
python webmain.py
python webmain.py --host 0.0.0.0 --port 8800
python webmain.py --host 0.0.0.0 --port 8800 --log-level debug
"""
from __future__ import annotations
import argparse
import logging
import logging.config
from datetime import datetime
from pathlib import Path
import uvicorn
REPO_ROOT = Path(__file__).resolve().parent
def _build_log_config(log_file: Path, level: str) -> dict:
"""Build a uvicorn-compatible logging config dict.
Writes to both stderr (console) and a rotating daily log file.
All webapp.* and rag_eval.* loggers inherit from root so every
logger.info() call in the API routes is captured.
"""
level_upper = level.upper()
return {
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"detailed": {
"format": "%(asctime)s %(levelname)-8s %(name)s %(message)s",
"datefmt": "%Y-%m-%d %H:%M:%S",
},
"console": {
"format": "%(asctime)s %(levelname)-8s %(name)-30s %(message)s",
"datefmt": "%H:%M:%S",
},
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"stream": "ext://sys.stderr",
"formatter": "console",
"level": level_upper,
},
"file": {
"class": "logging.handlers.RotatingFileHandler",
"filename": str(log_file),
"maxBytes": 50 * 1024 * 1024, # 50 MB per file
"backupCount": 7, # keep 7 rotated files
"encoding": "utf-8",
"formatter": "detailed",
"level": "DEBUG", # file always captures everything
},
},
"loggers": {
# Our application loggers — detailed level
"webapp": {"handlers": ["console", "file"], "level": level_upper, "propagate": False},
"rag_eval": {"handlers": ["console", "file"], "level": level_upper, "propagate": False},
# uvicorn access log — captured to file, shown on console
"uvicorn.access": {"handlers": ["console", "file"], "level": "INFO", "propagate": False},
"uvicorn.error": {"handlers": ["console", "file"], "level": "INFO", "propagate": False},
"uvicorn": {"handlers": ["console", "file"], "level": "INFO", "propagate": False},
},
"root": {
"handlers": ["console", "file"],
"level": "WARNING", # suppress noisy third-party libs at WARNING
},
}
def parse_args() -> argparse.Namespace:
@@ -24,17 +84,52 @@ def parse_args() -> argparse.Namespace:
action="store_true",
help="Enable auto-reload for local development.",
)
parser.add_argument(
"--log-level",
default="info",
choices=["debug", "info", "warning", "error"],
help="Console log level (default: info). File always captures DEBUG.",
)
parser.add_argument(
"--log-file",
default=None,
help="Log file path (default: logs/server_YYYY-MM-DD.log).",
)
return parser.parse_args()
def main() -> None:
"""Start uvicorn with the configured application."""
"""Start uvicorn with the configured application and logging."""
import uvicorn
args = parse_args()
# Resolve log file path
logs_dir = REPO_ROOT / "logs"
logs_dir.mkdir(parents=True, exist_ok=True)
if args.log_file:
log_file = Path(args.log_file)
else:
date_str = datetime.now().strftime("%Y-%m-%d")
log_file = logs_dir / f"server_{date_str}.log"
log_config = _build_log_config(log_file, args.log_level)
# Apply config before uvicorn starts so our loggers are ready immediately
logging.config.dictConfig(log_config)
logger = logging.getLogger("webapp.server")
logger.info(
"Starting RAGAS Console host=%s port=%d log_level=%s log_file=%s",
args.host, args.port, args.log_level, log_file,
)
uvicorn.run(
"webapp.server:app",
host=args.host,
port=args.port,
reload=args.reload,
log_config=log_config, # hand our config to uvicorn so it uses same handlers
)