48 KiB
LLM Profile Manager Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add a visual LLM configuration management feature to the siemens_ragas web console, allowing users to create/save named LLM profiles (model, base_url, api_key, timeout) and assign them to different task roles (judge, answer, dataset-build) when running evaluations, with selections written back to the scenario YAML before execution.
Architecture: Backend adds a ProfileManager service (memory + JSON file persistence, mirroring the existing TaskManager pattern) plus a llm_profiles FastAPI router. A new apply endpoint patches the selected profile fields into the target scenario YAML file. Frontend adds a new "LLM配置" sidebar view (profiles.js) and extends the existing "新建评估" view (runner.js) with a role-assignment panel that appears after selecting a scenario.
Tech Stack: Python 3.11+, FastAPI, Pydantic v2, PyYAML (already installed), vanilla JS (ES2022), existing CSS design tokens
File Map
New files
webapp/api/llm_profiles.py— FastAPI router: CRUD + apply endpointwebapp/services/profile_manager.py— in-memory + JSON persistence servicewebapp/static/js/profiles.js— frontend profile management viewconfigs/llm_profiles.json— persistent storage (auto-created on first write)tests/webapp/test_profile_manager.py— unit tests for ProfileManagertests/webapp/test_llm_profiles_api.py— integration tests for the API router
Modified files
webapp/models.py— add LLMProfile, ProfileApplyRequest, ProfileApplyResponse Pydantic modelswebapp/server.py— registerllm_profilesrouterwebapp/static/index.html— add "LLM配置" nav item; load profiles.jswebapp/static/js/api.js— add profile + apply API callswebapp/static/js/runner.js— add LLM role-assignment panel after scenario selectionwebapp/static/css/app.css— add styles for profile cards, role-assignment panel, modal form
Task 1: Pydantic Models
Files:
-
Modify:
webapp/models.py -
Step 1: Write failing test
# tests/webapp/test_profile_manager.py
import pytest
from webapp.models import LLMProfile, ProfileApplyRequest, ProfileApplyResponse
def test_llm_profile_defaults():
p = LLMProfile(
profile_id="abc",
name="Test",
model="gpt-4",
base_url="http://localhost/v1",
api_key="sk-test",
)
assert p.timeout_seconds == 30
assert p.created_at != ""
assert p.updated_at != ""
def test_profile_apply_request_fields():
req = ProfileApplyRequest(
scenario_path="scenarios/offline/sample.yaml",
judge_profile_id="id1",
answer_profile_id="id2",
dataset_profile_id=None,
)
assert req.judge_profile_id == "id1"
assert req.dataset_profile_id is None
def test_profile_apply_response():
resp = ProfileApplyResponse(scenario_path="scenarios/offline/sample.yaml", patched_fields=["judge_model"])
assert "judge_model" in resp.patched_fields
- Step 2: Run test to verify it fails
cd /c/Projects/AIProjects/Siemens-AIPOC/siemens_ragas
python -m pytest tests/webapp/test_profile_manager.py -v 2>&1 | head -30
Expected: ImportError or AttributeError (models not defined yet)
- Step 3: Add models to webapp/models.py
Append after the existing TriggerEvaluationResponse class (before jsonable):
class LLMProfile(BaseModel):
"""A named LLM connection configuration that can be reused across tasks."""
profile_id: str
name: str
model: str
base_url: str
api_key: str
timeout_seconds: int = 30
created_at: str = ""
updated_at: str = ""
class CreateProfileRequest(BaseModel):
"""Request body for creating or updating an LLM profile."""
name: str
model: str
base_url: str
api_key: str
timeout_seconds: int = 30
class ProfileApplyRequest(BaseModel):
"""Request body to patch LLM profile selections into a scenario YAML."""
scenario_path: str
judge_profile_id: str | None = None
answer_profile_id: str | None = None
dataset_profile_id: str | None = None
class ProfileApplyResponse(BaseModel):
"""Response after patching a scenario YAML with profile settings."""
scenario_path: str
patched_fields: list[str] = Field(default_factory=list)
- Step 4: Run tests to verify they pass
python -m pytest tests/webapp/test_profile_manager.py -v
Expected: 3 tests pass
- Step 5: Commit
git add webapp/models.py tests/webapp/test_profile_manager.py
git commit -m "feat: add LLMProfile pydantic models"
Task 2: ProfileManager Service
Files:
-
Create:
webapp/services/profile_manager.py -
Create:
configs/directory (auto-created) -
Step 1: Write failing tests (append to
tests/webapp/test_profile_manager.py)
import json, tempfile, pathlib
from webapp.services.profile_manager import ProfileManager
def _make_manager(tmp_path):
store = tmp_path / "profiles.json"
return ProfileManager(store_path=store)
def test_create_profile(tmp_path):
mgr = _make_manager(tmp_path)
p = mgr.create(name="Local", model="deepseek-v4-flash",
base_url="http://localhost/v1", api_key="sk-x")
assert p.profile_id != ""
assert p.name == "Local"
def test_list_profiles(tmp_path):
mgr = _make_manager(tmp_path)
mgr.create(name="A", model="m1", base_url="http://a/v1", api_key="k1")
mgr.create(name="B", model="m2", base_url="http://b/v1", api_key="k2")
profiles = mgr.list_all()
assert len(profiles) == 2
def test_get_profile(tmp_path):
mgr = _make_manager(tmp_path)
created = mgr.create(name="X", model="m", base_url="http://x/v1", api_key="k")
fetched = mgr.get(created.profile_id)
assert fetched is not None
assert fetched.name == "X"
def test_update_profile(tmp_path):
mgr = _make_manager(tmp_path)
p = mgr.create(name="Old", model="m", base_url="http://x/v1", api_key="k")
updated = mgr.update(p.profile_id, name="New", model="m2",
base_url="http://x/v1", api_key="k", timeout_seconds=60)
assert updated is not None
assert updated.name == "New"
assert updated.model == "m2"
assert updated.timeout_seconds == 60
def test_delete_profile(tmp_path):
mgr = _make_manager(tmp_path)
p = mgr.create(name="Del", model="m", base_url="http://x/v1", api_key="k")
assert mgr.delete(p.profile_id) is True
assert mgr.get(p.profile_id) is None
def test_persistence(tmp_path):
store = tmp_path / "profiles.json"
mgr1 = ProfileManager(store_path=store)
p = mgr1.create(name="Persist", model="m", base_url="http://x/v1", api_key="k")
mgr2 = ProfileManager(store_path=store)
assert mgr2.get(p.profile_id) is not None
def test_get_nonexistent(tmp_path):
mgr = _make_manager(tmp_path)
assert mgr.get("does-not-exist") is None
def test_delete_nonexistent(tmp_path):
mgr = _make_manager(tmp_path)
assert mgr.delete("does-not-exist") is False
- Step 2: Run test to verify it fails
python -m pytest tests/webapp/test_profile_manager.py -v -k "test_create or test_list or test_get or test_update or test_delete or test_persistence" 2>&1 | head -20
Expected: ImportError (module not found)
- Step 3: Create
webapp/services/profile_manager.py
"""In-memory + JSON-file LLM profile manager.
Profiles are kept in a dict keyed by profile_id and written to a JSON file
on every mutation, so they survive server restarts. The pattern mirrors
TaskManager but without threading (profiles are only mutated by API calls
that run in FastAPI's request handler, which is single-threaded per request).
"""
from __future__ import annotations
import json
import threading
import uuid
from datetime import datetime, timezone
from pathlib import Path
from webapp.models import LLMProfile
_DEFAULT_STORE = Path(__file__).resolve().parents[2] / "configs" / "llm_profiles.json"
def _now_iso() -> str:
return datetime.now(timezone.utc).isoformat()
class ProfileManager:
"""Manages LLM profiles with in-memory cache and JSON file persistence."""
def __init__(self, store_path: Path = _DEFAULT_STORE) -> None:
self._store_path = store_path
self._lock = threading.Lock()
self._profiles: dict[str, LLMProfile] = {}
self._load()
# ------------------------------------------------------------------ #
# Public API
# ------------------------------------------------------------------ #
def list_all(self) -> list[LLMProfile]:
"""Return all profiles sorted by creation time."""
with self._lock:
return sorted(self._profiles.values(), key=lambda p: p.created_at)
def get(self, profile_id: str) -> LLMProfile | None:
"""Return one profile by id, or None if not found."""
with self._lock:
return self._profiles.get(profile_id)
def create(
self,
name: str,
model: str,
base_url: str,
api_key: str,
timeout_seconds: int = 30,
) -> LLMProfile:
"""Create and persist a new profile, returning it."""
now = _now_iso()
profile = LLMProfile(
profile_id=uuid.uuid4().hex[:12],
name=name,
model=model,
base_url=base_url,
api_key=api_key,
timeout_seconds=timeout_seconds,
created_at=now,
updated_at=now,
)
with self._lock:
self._profiles[profile.profile_id] = profile
self._persist()
return profile
def update(
self,
profile_id: str,
name: str,
model: str,
base_url: str,
api_key: str,
timeout_seconds: int = 30,
) -> LLMProfile | None:
"""Update an existing profile in-place; returns None if not found."""
with self._lock:
existing = self._profiles.get(profile_id)
if existing is None:
return None
updated = existing.model_copy(update={
"name": name,
"model": model,
"base_url": base_url,
"api_key": api_key,
"timeout_seconds": timeout_seconds,
"updated_at": _now_iso(),
})
self._profiles[profile_id] = updated
self._persist()
return updated
def delete(self, profile_id: str) -> bool:
"""Remove a profile; returns True if deleted, False if not found."""
with self._lock:
if profile_id not in self._profiles:
return False
del self._profiles[profile_id]
self._persist()
return True
# ------------------------------------------------------------------ #
# Persistence helpers
# ------------------------------------------------------------------ #
def _load(self) -> None:
"""Load profiles from the JSON store file, ignoring missing/corrupt files."""
if not self._store_path.exists():
return
try:
data = json.loads(self._store_path.read_text(encoding="utf-8"))
for raw in data.get("profiles", []):
p = LLMProfile.model_validate(raw)
self._profiles[p.profile_id] = p
except Exception: # noqa: BLE001
pass # Corrupt store — start fresh
def _persist(self) -> None:
"""Write current profiles to the JSON store file (must be called under lock)."""
self._store_path.parent.mkdir(parents=True, exist_ok=True)
payload = {"profiles": [p.model_dump() for p in self._profiles.values()]}
self._store_path.write_text(
json.dumps(payload, ensure_ascii=False, indent=2),
encoding="utf-8",
)
# Module-level singleton shared by FastAPI routes.
profile_manager = ProfileManager()
- Step 4: Run tests to verify they pass
python -m pytest tests/webapp/test_profile_manager.py -v
Expected: All 11 tests pass
- Step 5: Commit
git add webapp/services/profile_manager.py tests/webapp/test_profile_manager.py
git commit -m "feat: add ProfileManager service with JSON persistence"
Task 3: LLM Profiles API Router
Files:
-
Create:
webapp/api/llm_profiles.py -
Create:
tests/webapp/test_llm_profiles_api.py -
Modify:
webapp/server.py -
Step 1: Write failing tests
# tests/webapp/test_llm_profiles_api.py
"""Integration tests for /api/llm-profiles endpoints."""
import json, pathlib, tempfile
import pytest
from fastapi.testclient import TestClient
@pytest.fixture()
def client(tmp_path, monkeypatch):
"""TestClient with a fresh ProfileManager backed by a temp file."""
store = tmp_path / "profiles.json"
# Patch the singleton before importing server
import webapp.services.profile_manager as pm_mod
from webapp.services.profile_manager import ProfileManager
fresh_mgr = ProfileManager(store_path=store)
monkeypatch.setattr(pm_mod, "profile_manager", fresh_mgr)
# Also patch inside the api module if already imported
import webapp.api.llm_profiles as api_mod
monkeypatch.setattr(api_mod, "profile_manager", fresh_mgr)
from webapp.server import create_app
return TestClient(create_app())
def test_list_empty(client):
resp = client.get("/api/llm-profiles")
assert resp.status_code == 200
assert resp.json()["profiles"] == []
def test_create_and_list(client):
body = {"name": "Test", "model": "m1", "base_url": "http://x/v1", "api_key": "k"}
resp = client.post("/api/llm-profiles", json=body)
assert resp.status_code == 201
data = resp.json()
assert data["name"] == "Test"
assert data["profile_id"] != ""
resp2 = client.get("/api/llm-profiles")
assert len(resp2.json()["profiles"]) == 1
def test_update_profile(client):
body = {"name": "Old", "model": "m1", "base_url": "http://x/v1", "api_key": "k"}
pid = client.post("/api/llm-profiles", json=body).json()["profile_id"]
upd = {"name": "New", "model": "m2", "base_url": "http://x/v1", "api_key": "k", "timeout_seconds": 60}
resp = client.put(f"/api/llm-profiles/{pid}", json=upd)
assert resp.status_code == 200
assert resp.json()["name"] == "New"
assert resp.json()["timeout_seconds"] == 60
def test_delete_profile(client):
body = {"name": "Del", "model": "m", "base_url": "http://x/v1", "api_key": "k"}
pid = client.post("/api/llm-profiles", json=body).json()["profile_id"]
resp = client.delete(f"/api/llm-profiles/{pid}")
assert resp.status_code == 200
assert resp.json()["deleted"] is True
assert len(client.get("/api/llm-profiles").json()["profiles"]) == 0
def test_update_nonexistent(client):
resp = client.put("/api/llm-profiles/nope",
json={"name": "X", "model": "m", "base_url": "http://x/v1", "api_key": "k"})
assert resp.status_code == 404
def test_delete_nonexistent(client):
resp = client.delete("/api/llm-profiles/nope")
assert resp.status_code == 404
- Step 2: Run test to verify it fails
python -m pytest tests/webapp/test_llm_profiles_api.py -v 2>&1 | head -20
Expected: ImportError (router not yet registered)
- Step 3: Create
webapp/api/llm_profiles.py
"""CRUD routes for LLM profiles plus the scenario-patching apply endpoint."""
from __future__ import annotations
from fastapi import APIRouter, HTTPException
from fastapi.responses import JSONResponse
from webapp.models import (
CreateProfileRequest,
LLMProfile,
ProfileApplyRequest,
ProfileApplyResponse,
)
from webapp.services.profile_manager import profile_manager
from webapp.services.yaml_patcher import apply_profiles_to_scenario
router = APIRouter(prefix="/api/llm-profiles", tags=["llm-profiles"])
@router.get("", response_model=dict)
def list_profiles() -> dict:
"""Return all saved LLM profiles."""
return {"profiles": [p.model_dump() for p in profile_manager.list_all()]}
@router.post("", status_code=201, response_model=LLMProfile)
def create_profile(request: CreateProfileRequest) -> LLMProfile:
"""Create a new LLM profile."""
return profile_manager.create(
name=request.name,
model=request.model,
base_url=request.base_url,
api_key=request.api_key,
timeout_seconds=request.timeout_seconds,
)
@router.put("/{profile_id}", response_model=LLMProfile)
def update_profile(profile_id: str, request: CreateProfileRequest) -> LLMProfile:
"""Update an existing LLM profile by id."""
updated = profile_manager.update(
profile_id=profile_id,
name=request.name,
model=request.model,
base_url=request.base_url,
api_key=request.api_key,
timeout_seconds=request.timeout_seconds,
)
if updated is None:
raise HTTPException(status_code=404, detail=f"Profile not found: {profile_id}")
return updated
@router.delete("/{profile_id}", response_model=dict)
def delete_profile(profile_id: str) -> dict:
"""Delete an LLM profile by id."""
deleted = profile_manager.delete(profile_id)
if not deleted:
raise HTTPException(status_code=404, detail=f"Profile not found: {profile_id}")
return {"deleted": True}
@router.post("/apply", response_model=ProfileApplyResponse)
def apply_profiles(request: ProfileApplyRequest) -> ProfileApplyResponse:
"""Patch selected LLM profiles into the target scenario YAML file."""
profiles: dict[str, LLMProfile | None] = {
"judge": profile_manager.get(request.judge_profile_id) if request.judge_profile_id else None,
"answer": profile_manager.get(request.answer_profile_id) if request.answer_profile_id else None,
"dataset": profile_manager.get(request.dataset_profile_id) if request.dataset_profile_id else None,
}
missing = [role for role, pid in [
("judge", request.judge_profile_id),
("answer", request.answer_profile_id),
("dataset", request.dataset_profile_id),
] if pid and profiles[role] is None]
if missing:
raise HTTPException(
status_code=400,
detail=f"Profile(s) not found for roles: {', '.join(missing)}",
)
patched = apply_profiles_to_scenario(
scenario_path=request.scenario_path,
judge_profile=profiles["judge"],
answer_profile=profiles["answer"],
dataset_profile=profiles["dataset"],
)
return ProfileApplyResponse(
scenario_path=request.scenario_path,
patched_fields=patched,
)
- Step 4: Register router in
webapp/server.py
Replace the import line:
from webapp.api import evaluations, runs, scenarios
with:
from webapp.api import evaluations, llm_profiles, runs, scenarios
And add inside create_app() after the existing app.include_router calls:
app.include_router(llm_profiles.router)
- Step 5: Run tests to verify they pass
python -m pytest tests/webapp/test_llm_profiles_api.py -v
Expected: 6 tests pass (apply test comes in Task 4)
- Step 6: Commit
git add webapp/api/llm_profiles.py webapp/server.py tests/webapp/test_llm_profiles_api.py
git commit -m "feat: add /api/llm-profiles CRUD router"
Task 4: YAML Patcher Service
Files:
- Create:
webapp/services/yaml_patcher.py - Modify:
tests/webapp/test_llm_profiles_api.py(add apply tests)
This service reads a scenario YAML, patches the relevant LLM fields, and writes it back.
YAML field mapping:
-
judge_profile→ patchesjudge_model(string),embedding_modelstays unchanged (same profile reused) -
answer_profile→ patchesapp_adapter.static_kwargs.model(only ifapp_adapterexists and type=python) -
dataset_profile→ patchesgeneration.model(for dataset build configs) -
Step 1: Write failing tests (append to
tests/webapp/test_llm_profiles_api.py)
import yaml as yaml_lib
def test_apply_judge_profile(client, tmp_path):
"""Applying a judge profile patches judge_model in the YAML."""
# Create a profile
body = {"name": "Judge", "model": "deepseek-v4-flash", "base_url": "http://x/v1", "api_key": "k"}
pid = client.post("/api/llm-profiles", json=body).json()["profile_id"]
# Create a minimal scenario YAML
scenario_file = tmp_path / "test-scenario.yaml"
scenario_file.write_text(
"scenario_name: test\nmode: offline\njudge_model: old-model\nembedding_model: emb\n"
"dataset: data.csv\nmetrics: [faithfulness]\noutput_dir: outputs/test\n",
encoding="utf-8",
)
# Monkeypatch the repo root resolution so patcher resolves our temp file
import webapp.services.yaml_patcher as patcher_mod
import pathlib
orig_resolve = patcher_mod._resolve_scenario_path
def fake_resolve(path_str):
return scenario_file
import monkeypatch # this won't work — use the client fixture's monkeypatch
# NOTE: This test uses the patcher directly instead
from webapp.services.yaml_patcher import apply_profiles_to_scenario
from webapp.models import LLMProfile
judge_p = LLMProfile(profile_id="x", name="J", model="new-model",
base_url="http://x/v1", api_key="k", created_at="", updated_at="")
patched = apply_profiles_to_scenario(
scenario_path=str(scenario_file),
judge_profile=judge_p,
answer_profile=None,
dataset_profile=None,
_resolve_absolute=True,
)
assert "judge_model" in patched
data = yaml_lib.safe_load(scenario_file.read_text())
assert data["judge_model"] == "new-model"
def test_apply_answer_profile(tmp_path):
"""Applying an answer profile patches app_adapter.static_kwargs.model."""
from webapp.services.yaml_patcher import apply_profiles_to_scenario
from webapp.models import LLMProfile
scenario_file = tmp_path / "online.yaml"
scenario_file.write_text(
"scenario_name: online\nmode: online\njudge_model: j\nembedding_model: emb\n"
"dataset: d.csv\nmetrics: [faithfulness]\noutput_dir: out\n"
"app_adapter:\n type: python\n callable: apps.foo:run\n"
" static_kwargs:\n model: old\n source_chunks_path: chunks.jsonl\n",
encoding="utf-8",
)
answer_p = LLMProfile(profile_id="y", name="A", model="new-answer-model",
base_url="http://x/v1", api_key="k", created_at="", updated_at="")
patched = apply_profiles_to_scenario(
scenario_path=str(scenario_file),
judge_profile=None,
answer_profile=answer_p,
dataset_profile=None,
_resolve_absolute=True,
)
assert "app_adapter.static_kwargs.model" in patched
data = yaml_lib.safe_load(scenario_file.read_text())
assert data["app_adapter"]["static_kwargs"]["model"] == "new-answer-model"
- Step 2: Run test to verify it fails
python -m pytest tests/webapp/test_llm_profiles_api.py::test_apply_judge_profile tests/webapp/test_llm_profiles_api.py::test_apply_answer_profile -v 2>&1 | head -20
Expected: ImportError (yaml_patcher not found)
- Step 3: Create
webapp/services/yaml_patcher.py
"""Patch LLM profile settings into scenario YAML files in-place.
Only the fields that correspond to a provided (non-None) profile are touched.
All other fields, comments, and structure are preserved by using ruamel.yaml
if available, or PyYAML (which loses comments) as fallback.
"""
from __future__ import annotations
from pathlib import Path
from typing import Any
import yaml
from webapp.models import LLMProfile
def _repo_root() -> Path:
return Path(__file__).resolve().parents[2]
def _resolve_scenario_path(path_str: str) -> Path:
"""Resolve a scenario path; absolute paths are used as-is."""
candidate = Path(path_str)
if candidate.is_absolute():
return candidate
return (_repo_root() / candidate).resolve()
def apply_profiles_to_scenario(
scenario_path: str,
judge_profile: LLMProfile | None,
answer_profile: LLMProfile | None,
dataset_profile: LLMProfile | None,
_resolve_absolute: bool = False,
) -> list[str]:
"""Patch the YAML file at *scenario_path* with the supplied profiles.
Returns a list of dotted field names that were actually patched.
"""
if _resolve_absolute:
resolved = Path(scenario_path)
else:
resolved = _resolve_scenario_path(scenario_path)
if not resolved.exists():
raise FileNotFoundError(f"Scenario file not found: {resolved}")
data: dict[str, Any] = yaml.safe_load(resolved.read_text(encoding="utf-8")) or {}
patched: list[str] = []
if judge_profile is not None:
data["judge_model"] = judge_profile.model
patched.append("judge_model")
if answer_profile is not None:
adapter = data.get("app_adapter")
if isinstance(adapter, dict):
static_kwargs = adapter.setdefault("static_kwargs", {})
static_kwargs["model"] = answer_profile.model
patched.append("app_adapter.static_kwargs.model")
if dataset_profile is not None:
generation = data.get("generation")
if isinstance(generation, dict):
generation["model"] = dataset_profile.model
patched.append("generation.model")
resolved.write_text(
yaml.dump(data, allow_unicode=True, default_flow_style=False, sort_keys=False),
encoding="utf-8",
)
return patched
- Step 4: Run tests to verify they pass
python -m pytest tests/webapp/test_llm_profiles_api.py -v
Expected: All tests pass
- Step 5: Commit
git add webapp/services/yaml_patcher.py tests/webapp/test_llm_profiles_api.py
git commit -m "feat: add yaml_patcher service to apply LLM profiles to scenario YAML"
Task 5: Frontend — profiles.js (LLM配置管理页)
Files:
- Create:
webapp/static/js/profiles.js - Modify:
webapp/static/js/api.js - Modify:
webapp/static/index.html - Modify:
webapp/static/css/app.css
This task adds the "LLM配置" sidebar page: list all profiles as cards, create new profile via inline form, edit/delete existing profiles.
- Step 1: Add profile API calls to
api.js
Append to the API object (before the closing };):
profiles() { return API.get("/api/llm-profiles"); },
createProfile(body) { return API.post("/api/llm-profiles", body); },
updateProfile(id, body) {
return fetch(`/api/llm-profiles/${encodeURIComponent(id)}`, {
method: "PUT",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(body),
}).then(async r => {
if (!r.ok) { const d = await API._extractError(r); throw new Error(d); }
return r.json();
});
},
deleteProfile(id) {
return fetch(`/api/llm-profiles/${encodeURIComponent(id)}`, { method: "DELETE" })
.then(async r => {
if (!r.ok) { const d = await API._extractError(r); throw new Error(d); }
return r.json();
});
},
applyProfiles(body) { return API.post("/api/llm-profiles/apply", body); },
- Step 2: Add nav item to
index.html
In the <nav class="nav"> section, add after the existing nav buttons (before </nav>):
<button class="nav-item" data-view="profiles">
<span class="nav-ico">⚙</span><span>LLM 配置</span>
</button>
Add "profiles" to the views and add a new section at the bottom of <main> (before </main>):
<!-- LLM 配置视图 -->
<section class="view" id="view-profiles" hidden>
<div class="panel">
<div class="panel-head">
<h2>LLM 配置管理</h2>
<button class="btn btn-primary" id="add-profile-btn">+ 新建配置</button>
</div>
<p class="muted">保存常用 LLM 连接参数,在运行评估时按角色选择。</p>
</div>
<!-- 新建 / 编辑表单(默认隐藏) -->
<div class="panel" id="profile-form-panel" hidden>
<h2 id="profile-form-title">新建 LLM 配置</h2>
<div class="profile-form">
<input type="hidden" id="edit-profile-id" />
<div class="form-row">
<label class="form-label">配置名称 <span class="req">*</span></label>
<input class="form-input" id="pf-name" placeholder="例:DeepSeek Flash(内网)" />
</div>
<div class="form-row">
<label class="form-label">模型名称 <span class="req">*</span></label>
<input class="form-input" id="pf-model" placeholder="例:deepseek-v4-flash" />
</div>
<div class="form-row">
<label class="form-label">Base URL <span class="req">*</span></label>
<input class="form-input" id="pf-base-url" placeholder="例:http://6.86.80.4:30080/v1" />
</div>
<div class="form-row">
<label class="form-label">API Key <span class="req">*</span></label>
<input class="form-input" id="pf-api-key" type="password" placeholder="sk-…" />
</div>
<div class="form-row">
<label class="form-label">超时(秒)</label>
<input class="form-input form-input-sm" id="pf-timeout" type="number" value="30" min="5" max="300" />
</div>
<div class="form-actions">
<button class="btn btn-primary" id="save-profile-btn">保存</button>
<button class="btn" id="cancel-profile-btn">取消</button>
<span class="form-error muted" id="profile-form-error"></span>
</div>
</div>
</div>
<div id="profile-cards" class="profile-grid"></div>
<div class="empty" id="profiles-empty" hidden>
<p>尚未添加任何 LLM 配置。</p>
<p class="muted">点击「新建配置」添加第一个。</p>
</div>
</section>
- Step 3: Create
webapp/static/js/profiles.js
// profiles.js — LLM 配置管理页面逻辑
const Profiles = {
_data: [],
// 初始化:绑定按钮事件
init() {
document.getElementById("add-profile-btn").addEventListener("click", () => Profiles.showForm());
document.getElementById("save-profile-btn").addEventListener("click", () => Profiles.save());
document.getElementById("cancel-profile-btn").addEventListener("click", () => Profiles.hideForm());
},
// 加载并渲染 Profile 列表
async load() {
const grid = document.getElementById("profile-cards");
const empty = document.getElementById("profiles-empty");
grid.innerHTML = '<p class="muted">加载中…</p>';
try {
const data = await API.profiles();
Profiles._data = data.profiles || [];
grid.innerHTML = "";
if (Profiles._data.length === 0) {
empty.hidden = false;
} else {
empty.hidden = true;
Profiles._data.forEach(p => grid.appendChild(Profiles.renderCard(p)));
}
} catch (err) {
grid.innerHTML = `<p class="muted">加载失败:${App.escape(err.message)}</p>`;
}
},
// 渲染单个 Profile 卡片
renderCard(p) {
const card = document.createElement("div");
card.className = "profile-card";
card.dataset.id = p.profile_id;
card.innerHTML = `
<div class="profile-card-head">
<div class="profile-card-name">${App.escape(p.name)}</div>
<div class="profile-card-actions">
<button class="btn btn-sm" data-action="edit">编辑</button>
<button class="btn btn-sm btn-danger" data-action="delete">删除</button>
</div>
</div>
<div class="profile-card-field"><span class="field-label">模型</span> <code>${App.escape(p.model)}</code></div>
<div class="profile-card-field"><span class="field-label">Base URL</span> <code>${App.escape(p.base_url)}</code></div>
<div class="profile-card-field"><span class="field-label">超时</span> ${p.timeout_seconds}s</div>
`;
card.querySelector("[data-action=edit]").addEventListener("click", () => Profiles.showForm(p));
card.querySelector("[data-action=delete]").addEventListener("click", () => Profiles.remove(p.profile_id, p.name));
return card;
},
// 显示新建或编辑表单
showForm(profile = null) {
const panel = document.getElementById("profile-form-panel");
const title = document.getElementById("profile-form-title");
panel.hidden = false;
title.textContent = profile ? "编辑 LLM 配置" : "新建 LLM 配置";
document.getElementById("edit-profile-id").value = profile ? profile.profile_id : "";
document.getElementById("pf-name").value = profile ? profile.name : "";
document.getElementById("pf-model").value = profile ? profile.model : "";
document.getElementById("pf-base-url").value = profile ? profile.base_url : "";
document.getElementById("pf-api-key").value = profile ? profile.api_key : "";
document.getElementById("pf-timeout").value = profile ? profile.timeout_seconds : 30;
document.getElementById("profile-form-error").textContent = "";
panel.scrollIntoView({ behavior: "smooth", block: "start" });
},
hideForm() {
document.getElementById("profile-form-panel").hidden = true;
},
// 保存(新建 or 更新)
async save() {
const id = document.getElementById("edit-profile-id").value;
const body = {
name: document.getElementById("pf-name").value.trim(),
model: document.getElementById("pf-model").value.trim(),
base_url: document.getElementById("pf-base-url").value.trim(),
api_key: document.getElementById("pf-api-key").value.trim(),
timeout_seconds: parseInt(document.getElementById("pf-timeout").value, 10) || 30,
};
const errEl = document.getElementById("profile-form-error");
if (!body.name || !body.model || !body.base_url || !body.api_key) {
errEl.textContent = "请填写所有必填字段(名称、模型、Base URL、API Key)";
return;
}
try {
if (id) {
await API.updateProfile(id, body);
} else {
await API.createProfile(body);
}
Profiles.hideForm();
await Profiles.load();
} catch (err) {
errEl.textContent = `保存失败:${err.message}`;
}
},
// 删除 Profile
async remove(profileId, name) {
if (!confirm(`确认删除配置「${name}」?`)) return;
try {
await API.deleteProfile(profileId);
await Profiles.load();
} catch (err) {
alert(`删除失败:${err.message}`);
}
},
// 获取当前已加载的 profiles(供 runner.js 使用)
getAll() {
return Profiles._data;
},
};
- Step 4: Add CSS for profiles page to
app.css
Append to webapp/static/css/app.css:
/* ---------- LLM 配置管理页 ---------- */
.profile-grid { display: grid; grid-template-columns: repeat(auto-fill, minmax(300px, 1fr)); gap: 16px; }
.profile-card {
background: var(--surface); border: 1px solid var(--line); border-radius: var(--radius);
padding: 16px; box-shadow: var(--shadow);
}
.profile-card-head { display: flex; justify-content: space-between; align-items: center; margin-bottom: 10px; }
.profile-card-name { font-size: 15px; font-weight: 600; }
.profile-card-actions { display: flex; gap: 6px; }
.profile-card-field { font-size: 12px; color: var(--slate); margin-top: 4px; }
.field-label { font-weight: 600; color: var(--ink); }
/* Form */
.profile-form { display: flex; flex-direction: column; gap: 12px; margin-top: 14px; max-width: 560px; }
.form-row { display: flex; flex-direction: column; gap: 4px; }
.form-label { font-size: 13px; font-weight: 600; }
.req { color: var(--bad); }
.form-input {
border: 1px solid var(--line); border-radius: 6px; padding: 8px 10px;
font-size: 13px; font-family: inherit; width: 100%;
}
.form-input:focus { outline: none; border-color: var(--petrol); }
.form-input-sm { max-width: 120px; }
.form-actions { display: flex; gap: 10px; align-items: center; margin-top: 4px; }
.form-error { font-size: 12px; color: var(--bad); }
.btn-sm { padding: 4px 10px; font-size: 12px; }
.btn-danger { color: var(--bad); border-color: var(--bad); }
.btn-danger:hover { background: #fee2e2; }
- Step 5: Update
index.htmlto loadprofiles.js
Add before the closing </body>:
<script src="/static/js/profiles.js"></script>
(place it before <script src="/static/js/app.js"></script>)
- Step 6: Update
app.jsto handle the new view
In App.views, add "profiles":
views: ["runs", "new", "report", "profiles"],
In App.titles, add:
profiles: "LLM 配置",
In App.switchView, add after if (view === "report") Report.render(App.currentRunId);:
if (view === "profiles") { Profiles.load(); }
Also call Profiles.init() inside App.init():
Profiles.init();
- Step 7: Smoke test the server starts
cd /c/Projects/AIProjects/Siemens-AIPOC/siemens_ragas
python -c "from webapp.server import create_app; app = create_app(); print('OK')"
Expected: OK
- Step 8: Commit
git add webapp/static/js/profiles.js webapp/static/js/api.js webapp/static/js/app.js webapp/static/index.html webapp/static/css/app.css
git commit -m "feat: add LLM配置 management page (profiles view)"
Task 6: Frontend — LLM Role-Assignment Panel in runner.js
Files:
- Modify:
webapp/static/js/runner.js - Modify:
webapp/static/css/app.css - Modify:
webapp/static/index.html
After the user selects a scenario, show a collapsible LLM assignment panel with dropdowns for Judge/Answer/Dataset roles. On "运行评估", first call applyProfiles, then trigger evaluation.
- Step 1: Add HTML for role-assignment panel to
index.html
Inside <section class="view" id="view-new">, after the .run-actions div and before the task-panel:
<!-- LLM 角色配置面板(选中场景后显示) -->
<div class="panel llm-assignment-panel" id="llm-assignment-panel" hidden>
<h2>LLM 角色配置 <span class="muted" style="font-size:13px;font-weight:400">(可选)</span></h2>
<p class="muted" style="margin-bottom:14px">为不同任务角色选择已保存的 LLM 配置,留空则使用场景文件中的原始配置。</p>
<div class="llm-role-rows">
<div class="llm-role-row">
<label class="llm-role-label">评测打分 Judge LLM</label>
<select class="select llm-role-select" id="role-judge">
<option value="">— 使用场景原始配置 —</option>
</select>
</div>
<div class="llm-role-row">
<label class="llm-role-label">生成答案 Answer LLM</label>
<select class="select llm-role-select" id="role-answer">
<option value="">— 使用场景原始配置 —</option>
</select>
</div>
<div class="llm-role-row">
<label class="llm-role-label">生成题库 Dataset LLM</label>
<select class="select llm-role-select" id="role-dataset">
<option value="">— 使用场景原始配置 —</option>
</select>
</div>
</div>
</div>
- Step 2: Add CSS for role-assignment panel to
app.css
/* ---------- LLM 角色配置面板 ---------- */
.llm-assignment-panel { border-left: 3px solid var(--petrol); }
.llm-role-rows { display: flex; flex-direction: column; gap: 10px; }
.llm-role-row { display: flex; align-items: center; gap: 14px; }
.llm-role-label { font-size: 13px; font-weight: 600; min-width: 180px; color: var(--ink); }
.llm-role-select { min-width: 240px; }
- Step 3: Extend
runner.js— add profile loading and apply logic
Replace the entire contents of runner.js with:
// runner.js — 新建评估视图:列出场景、LLM角色配置、触发评估、轮询任务状态与日志。
const Runner = {
selectedScenario: null,
pollTimer: null,
lastRunId: null,
// 绑定运行按钮。
init() {
document.getElementById("run-btn").addEventListener("click", () => Runner.trigger());
document.getElementById("view-report-btn").addEventListener("click", () => {
if (Runner.lastRunId) {
App.currentRunId = Runner.lastRunId;
App.enableReportNav();
App.switchView("report");
}
});
},
// 加载并渲染可触发的场景列表。
async loadScenarios() {
const list = document.getElementById("scenario-list");
list.innerHTML = '<p class="muted">加载中…</p>';
try {
const data = await API.scenarios();
const scenarios = data.scenarios || [];
if (scenarios.length === 0) {
list.innerHTML = '<p class="muted">未在 scenarios/ 下找到场景文件。</p>';
return;
}
list.innerHTML = "";
scenarios.forEach((sc) => list.appendChild(Runner.renderScenarioItem(sc)));
} catch (err) {
list.innerHTML = `<p class="muted">加载失败:${App.escape(err.message)}</p>`;
}
// 同时加载 profiles 供角色选择
Runner._populateProfileSelects();
},
// 填充三个角色下拉框
async _populateProfileSelects() {
const profiles = Profiles.getAll().length > 0
? Profiles.getAll()
: (await API.profiles().catch(() => ({ profiles: [] }))).profiles;
["role-judge", "role-answer", "role-dataset"].forEach(id => {
const sel = document.getElementById(id);
// 保留第一个 placeholder option
sel.innerHTML = '<option value="">— 使用场景原始配置 —</option>';
profiles.forEach(p => {
const opt = document.createElement("option");
opt.value = p.profile_id;
opt.textContent = `${p.name} (${p.model})`;
sel.appendChild(opt);
});
});
},
// 构造单个场景条目。
renderScenarioItem(sc) {
const item = document.createElement("div");
const invalid = !!sc.error;
item.className = "scenario-item" + (invalid ? " invalid" : "");
const modeTag = sc.mode
? `<span class="tag mode-${App.escape(sc.mode)}">${App.escape(sc.mode)}</span>`
: "";
const metricCount = (sc.metrics || []).length;
item.innerHTML = `
<div>
<div class="scenario-name">${App.escape(sc.scenario_name || sc.path)}</div>
<div class="scenario-path">${App.escape(sc.path)}</div>
${sc.error ? `<div class="scenario-path" style="color:#dc2626">${App.escape(sc.error)}</div>` : ""}
</div>
<div class="scenario-tags">
${modeTag}
<span class="tag">${metricCount} 指标</span>
</div>
`;
if (!invalid) {
item.addEventListener("click", () => {
document.querySelectorAll(".scenario-item").forEach((el) => el.classList.remove("selected"));
item.classList.add("selected");
Runner.selectedScenario = sc.path;
document.getElementById("selected-scenario").textContent = sc.path;
document.getElementById("run-btn").disabled = false;
// 显示 LLM 角色面板
document.getElementById("llm-assignment-panel").hidden = false;
});
}
return item;
},
// 触发评估:先 apply profiles(若选了),再触发任务。
async trigger() {
if (!Runner.selectedScenario) return;
const runBtn = document.getElementById("run-btn");
runBtn.disabled = true;
const panel = document.getElementById("task-panel");
const logBox = document.getElementById("task-log");
const statusBadge = document.getElementById("task-status");
const reportBtn = document.getElementById("view-report-btn");
panel.hidden = false;
reportBtn.hidden = true;
logBox.textContent = "";
Runner._setStatus(statusBadge, "queued");
try {
// Step 1: apply LLM profiles to YAML if any selected
await Runner._applyProfilesIfNeeded(logBox);
// Step 2: trigger evaluation
const resp = await API.triggerEvaluation(Runner.selectedScenario);
Runner.poll(resp.task_id);
} catch (err) {
Runner._setStatus(statusBadge, "failed");
logBox.textContent = (logBox.textContent ? logBox.textContent + "\n" : "") + `触发失败:${err.message}`;
runBtn.disabled = false;
}
},
// 如果用户选了 profile,就先 apply 写回 YAML
async _applyProfilesIfNeeded(logBox) {
const judgeId = document.getElementById("role-judge").value;
const answerId = document.getElementById("role-answer").value;
const datasetId = document.getElementById("role-dataset").value;
if (!judgeId && !answerId && !datasetId) return; // 全空,跳过
logBox.textContent = "正在将 LLM 配置写入场景文件…\n";
const body = {
scenario_path: Runner.selectedScenario,
judge_profile_id: judgeId || null,
answer_profile_id: answerId || null,
dataset_profile_id: datasetId || null,
};
const result = await API.applyProfiles(body);
const fields = (result.patched_fields || []).join(", ");
logBox.textContent += fields
? `✓ 已更新字段:${fields}\n`
: "(未找到可更新的字段,继续运行)\n";
},
// 周期性轮询任务状态,刷新日志与徽标。
poll(taskId) {
const logBox = document.getElementById("task-log");
const statusBadge = document.getElementById("task-status");
const reportBtn = document.getElementById("view-report-btn");
const runBtn = document.getElementById("run-btn");
if (Runner.pollTimer) clearInterval(Runner.pollTimer);
Runner.pollTimer = setInterval(async () => {
try {
const status = await API.taskStatus(taskId);
logBox.textContent = (status.logs || []).join("\n");
logBox.scrollTop = logBox.scrollHeight;
Runner._setStatus(statusBadge, status.status);
if (status.status === "completed" || status.status === "failed") {
clearInterval(Runner.pollTimer);
runBtn.disabled = false;
if (status.status === "completed" && status.run_id) {
Runner.lastRunId = status.run_id;
reportBtn.hidden = false;
}
}
} catch (err) {
clearInterval(Runner.pollTimer);
logBox.textContent += `\n轮询失败:${err.message}`;
runBtn.disabled = false;
}
}, 1200);
},
// 更新状态徽标的文本与配色类。
_setStatus(badge, status) {
badge.textContent = status;
badge.className = "badge " + status;
},
};
- Step 4: Smoke-test server import
python -c "from webapp.server import create_app; app = create_app(); print('OK')"
Expected: OK
- Step 5: Commit
git add webapp/static/js/runner.js webapp/static/index.html webapp/static/css/app.css
git commit -m "feat: add LLM role-assignment panel to 新建评估 view"
Task 7: End-to-End Smoke Test & Init Files
Files:
-
Create:
tests/webapp/__init__.py(if missing) -
Verify all tests pass
-
Step 1: Ensure test package init files exist
# Check what init files exist
ls tests/ && ls tests/webapp/ 2>/dev/null || echo "no webapp dir"
Create missing init files:
touch tests/__init__.py 2>/dev/null; touch tests/webapp/__init__.py 2>/dev/null; echo done
- Step 2: Run full test suite
python -m pytest tests/webapp/ -v
Expected: All tests pass (≥ 17 tests)
- Step 3: Verify server starts and routes are registered
python -c "
from webapp.server import create_app
app = create_app()
routes = [r.path for r in app.routes]
assert '/api/llm-profiles' in routes or any('llm-profiles' in r for r in routes), 'Route missing'
print('Routes OK:', [r for r in routes if 'llm' in r or 'profile' in r])
"
Expected: prints routes including /api/llm-profiles
- Step 4: Final commit
git add tests/
git commit -m "test: ensure test package structure and all webapp tests pass"