20 KiB
System Status Module Optimization Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Upgrade the 系统状态 page from a static one-shot snapshot into a reliable observability dashboard with loading states, service health checks, auto-refresh, BM25/Reranker visibility, and config display fixes.
Architecture: Backend adds a unified /status/health aggregate endpoint (Milvus + MinIO + BM25 + Reranker + Sessions) and a TTL cache on /status/stats. Frontend adds loading/error states, a refresh button, auto-polling while documents are processing, a health services panel, and config value truncation fixes.
Tech Stack: FastAPI, Python 3.11, React 18, TypeScript, CSS-in-JS (inline styles with theme context)
File Map
| File | Action | Purpose |
|---|---|---|
backend/app/api/routes/status.py |
Modify | Add /health endpoint, TTL cache on /stats, import session store |
frontend/src/api/status.ts |
Modify | Add SystemHealth type + getSystemHealth() |
frontend/src/api/index.ts |
Modify | Export SystemHealth interface |
frontend/src/pages/Status/StatusPage.tsx |
Modify | Loading/error states, refresh, auto-poll, health panel, config fix |
Task 1: Backend — Add /status/health Endpoint + TTL Cache on /status/stats
Files:
-
Modify:
backend/app/api/routes/status.py -
Step 1: Read the current
status.py
File: backend/app/api/routes/status.py
Note existing imports and route signatures before editing.
- Step 2: Replace
status.pywith the new version
Replace the entire content of backend/app/api/routes/status.py with:
"""Define API routes for status."""
import time
from typing import Any
from fastapi import APIRouter
from app.config.settings import settings
from app.shared.bootstrap import (
get_bm25_retriever,
get_binary_store,
get_conversation_store,
get_document_query_service,
get_vector_index,
)
router = APIRouter(prefix="/status", tags=["系统状态"])
# ---------------------------------------------------------------------------
# Simple TTL cache for /stats (avoids O(N) doc scan on every request)
# ---------------------------------------------------------------------------
_stats_cache: dict[str, Any] = {}
_stats_cache_time: float = 0.0
_STATS_TTL_SECONDS: float = 10.0
def _invalidate_stats_cache() -> None:
global _stats_cache_time
_stats_cache_time = 0.0
@router.get("/stats")
async def get_stats():
"""Return document statistics (cached for 10 s)."""
global _stats_cache, _stats_cache_time
now = time.time()
if _stats_cache and (now - _stats_cache_time) < _STATS_TTL_SECONDS:
return _stats_cache
documents = get_document_query_service().list_documents()
indexed = sum(1 for d in documents if d.status.value == "indexed")
failed = sum(1 for d in documents if d.status.value == "failed")
_stats_cache = {
"documents_total": len(documents),
"documents_indexed": indexed,
"documents_failed": failed,
"chunks_total": sum(d.chunk_count for d in documents),
}
_stats_cache_time = now
return _stats_cache
@router.get("/config")
async def get_config():
"""Return system configuration."""
return {
"embedding_model": settings.embedding_model,
"embedding_dim": settings.embedding_dim,
"embedding_base_url": settings.embedding_base_url,
"milvus_collection": settings.milvus_collection,
"parser_backend": settings.parser_backend,
"chunk_backend": settings.chunk_backend,
"artifact_prefix": settings.document_parse_artifact_prefix,
"parser_failure_mode": settings.parser_failure_mode,
"llm_provider": settings.llm_provider,
"llm_model": settings.llm_model,
"document_metadata_path": settings.document_metadata_path,
}
@router.get("/milvus/health")
async def milvus_health():
"""Return Milvus health (kept for backwards compat)."""
return get_vector_index().health()
@router.get("/health")
async def get_health():
"""Return aggregate health of all backend services."""
# --- Milvus ---
try:
milvus_info = get_vector_index().health()
milvus_status = "ok" if milvus_info.get("connected") else "error"
except Exception as exc: # noqa: BLE001
milvus_info = {}
milvus_status = "error"
milvus_info["error"] = str(exc)
# --- MinIO ---
try:
minio_connected = get_binary_store().client.connected
minio_status = "ok" if minio_connected else "error"
except Exception as exc: # noqa: BLE001
minio_status = "error"
minio_connected = False
# --- BM25 ---
bm25 = get_bm25_retriever()
# --- Sessions ---
try:
session_count = len(get_conversation_store().list_sessions())
except Exception: # noqa: BLE001
session_count = 0
return {
"milvus": {"status": milvus_status, **milvus_info},
"minio": {"status": minio_status, "connected": minio_connected},
"bm25": {"available": bm25 is not None},
"reranker": {
"enabled": settings.reranker_enabled,
"model": settings.reranker_model if settings.reranker_enabled else None,
},
"sessions": {
"active": session_count,
"max": settings.session_max_sessions,
},
}
- Step 3: Verify Python syntax
cd backend && python -c "from app.api.routes.status import router; print('OK')"
Expected output: OK
- Step 4: Commit
git add backend/app/api/routes/status.py
git commit -m "feat(status): add /health aggregate endpoint and 10s TTL cache on /stats"
Task 2: Frontend API — Add SystemHealth Type + getSystemHealth()
Files:
-
Modify:
frontend/src/api/index.ts -
Modify:
frontend/src/api/status.ts -
Step 1: Add
SystemHealthinterface tofrontend/src/api/index.ts
Open frontend/src/api/index.ts. After the SystemConfig interface (around line 240), add:
export interface ServiceHealth {
status: 'ok' | 'error' | 'unknown';
error?: string;
}
export interface SystemHealth {
milvus: ServiceHealth & {
connected?: boolean;
collection_name?: string;
num_entities?: number;
};
minio: ServiceHealth & { connected?: boolean };
bm25: { available: boolean };
reranker: { enabled: boolean; model?: string | null };
sessions: { active: number; max: number };
}
- Step 2: Add
getSystemHealth()tofrontend/src/api/status.ts
Open frontend/src/api/status.ts. The current content is:
import { fetchAPI, type SystemConfig, type SystemStats } from './index';
export async function getSystemStats(): Promise<SystemStats> {
return fetchAPI<SystemStats>('/status/stats');
}
export async function getSystemConfig(): Promise<SystemConfig> {
return fetchAPI<SystemConfig>('/status/config');
}
export async function getMilvusHealth(): Promise<{ connected: boolean; collections: string[] }> {
return fetchAPI('/status/milvus/health');
}
export type { SystemConfig, SystemStats };
Replace with:
import { fetchAPI, type SystemConfig, type SystemHealth, type SystemStats } from './index';
export async function getSystemStats(): Promise<SystemStats> {
return fetchAPI<SystemStats>('/status/stats');
}
export async function getSystemConfig(): Promise<SystemConfig> {
return fetchAPI<SystemConfig>('/status/config');
}
export async function getSystemHealth(): Promise<SystemHealth> {
return fetchAPI<SystemHealth>('/status/health');
}
export type { SystemConfig, SystemHealth, SystemStats };
- Step 3: Commit
git add frontend/src/api/index.ts frontend/src/api/status.ts
git commit -m "feat(status): add SystemHealth type and getSystemHealth() API function"
Task 3: Frontend — Loading/Error States + Refresh Button + Auto-Poll
Files:
- Modify:
frontend/src/pages/Status/StatusPage.tsx
This task replaces the useEffect + loadData pattern and the top of the component with loading/error/refresh support. Auto-poll fires every 5 s while any document is parsing or pending.
- Step 1: Read current
StatusPage.tsx
File: frontend/src/pages/Status/StatusPage.tsx
Identify: the useState block, loadData function, and the single useEffect.
- Step 2: Replace the import block and state/effect section
Find and replace the imports at the top of the file:
Old:
import React, { useEffect, useState } from 'react';
New:
import React, { useCallback, useEffect, useState } from 'react';
- Step 3: Add
SystemHealthto the API import
Old:
import { getSystemStats, getSystemConfig, type SystemStats, type SystemConfig } from '../../api/status';
New:
import { getSystemStats, getSystemConfig, getSystemHealth, type SystemStats, type SystemConfig, type SystemHealth } from '../../api/status';
- Step 4: Replace the state declarations and loadData + useEffect block inside the component
Find the block starting with const [stats, setStats] and ending after the closing }, []); of the first useEffect. Replace it entirely with:
const [stats, setStats] = useState<SystemStats>({
documents_total: 0,
documents_indexed: 0,
documents_failed: 0,
chunks_total: 0,
});
const [config, setConfig] = useState<SystemConfig | null>(null);
const [docs, setDocs] = useState<DocInfo[]>([]);
const [health, setHealth] = useState<SystemHealth | null>(null);
const [loading, setLoading] = useState(true);
const [error, setError] = useState<string | null>(null);
const loadData = useCallback(async () => {
setLoading(true);
setError(null);
try {
const [statsRes, configRes, docsRes, healthRes] = await Promise.all([
getSystemStats(),
getSystemConfig(),
getDocumentList(),
getSystemHealth(),
]);
setStats(statsRes);
setConfig(configRes);
setDocs(docsRes.docs);
setHealth(healthRes);
} catch (err) {
setError(err instanceof Error ? err.message : 'Failed to load status data');
} finally {
setLoading(false);
}
}, []);
// Initial load
useEffect(() => {
void loadData();
}, [loadData]);
// Auto-poll every 5 s while any document is still processing
useEffect(() => {
const hasProcessing = docs.some(d => d.status === 'parsing' || d.status === 'pending');
if (!hasProcessing) return;
const id = window.setInterval(() => void loadData(), 5000);
return () => window.clearInterval(id);
}, [docs, loadData]);
- Step 5: Add loading banner and error banner at the top of the returned JSX
Find the return ( statement in the component. Inside <Content>, right after <TPattern />, add:
{/* Loading overlay */}
{loading && (
<div style={{ display: 'flex', alignItems: 'center', gap: 8, marginBottom: 16, color: theme.text3, fontSize: 13 }}>
<span style={{ display: 'inline-block', width: 12, height: 12, borderRadius: '50%', border: `2px solid ${theme.accent}`, borderTopColor: 'transparent', animation: 'spin 0.8s linear infinite' }} />
<span className="mono">LOADING...</span>
</div>
)}
{/* Error banner */}
{error && (
<div style={{
marginBottom: 16,
padding: '12px 16px',
background: '#d6454520',
border: '1px solid #d64545',
borderRadius: 8,
display: 'flex',
alignItems: 'center',
justifyContent: 'space-between',
}}>
<span style={{ fontSize: 13, color: '#d64545' }}>{error}</span>
<button
onClick={() => void loadData()}
style={{ background: 'none', border: '1px solid #d64545', borderRadius: 6, color: '#d64545', cursor: 'pointer', padding: '4px 10px', fontSize: 12 }}
>
重试
</button>
</div>
)}
- Step 6: Add a refresh button to the stats section header
Find the <section style={{ marginBottom: 48 }}> that wraps the 4 stats cards. Add a header row with a refresh button just before the grid <div>:
<section style={{ marginBottom: 48 }}>
<div style={{ display: 'flex', alignItems: 'center', justifyContent: 'space-between', marginBottom: 16 }}>
<h2 style={{ fontSize: 14, fontWeight: 600, color: theme.accent, letterSpacing: '1px', margin: 0 }}>
DOCUMENT STATISTICS
</h2>
<button
onClick={() => void loadData()}
disabled={loading}
style={{
background: 'none',
border: `1px solid ${theme.border}`,
borderRadius: 6,
color: theme.text3,
cursor: loading ? 'not-allowed' : 'pointer',
padding: '4px 12px',
fontSize: 11,
opacity: loading ? 0.5 : 1,
}}
>
↻ 刷新
</button>
</div>
{/* existing 4-card grid stays here unchanged */}
- Step 7: Add CSS keyframe for the spinner
Find the <Content> wrapping element. Add a <style> tag as the very first child inside <Content>:
<style>{`@keyframes spin { to { transform: rotate(360deg); } }`}</style>
- Step 8: Commit
git add frontend/src/pages/Status/StatusPage.tsx
git commit -m "feat(status): add loading/error states, refresh button, auto-poll for processing docs"
Task 4: Frontend — Service Health Panel
Files:
- Modify:
frontend/src/pages/Status/StatusPage.tsx
Adds a new section after the stats cards that shows a colored status badge for each service (Milvus, MinIO, BM25, Reranker) plus the active session count.
- Step 1: Add the
ServiceBadgehelper component (add just beforeStatsCard)
const ServiceBadge = ({
label,
status,
detail,
}: {
label: string;
status: 'ok' | 'error' | 'unknown' | boolean;
detail?: string;
}) => {
const { theme } = useTheme();
const isOk = status === 'ok' || status === true;
const color = isOk ? theme.green : '#d64545';
const dot = isOk ? '●' : '●';
return (
<div style={{
display: 'flex',
alignItems: 'center',
justifyContent: 'space-between',
padding: '12px 16px',
background: theme.bgCard,
borderRadius: 10,
border: `1px solid ${theme.border}`,
}}>
<div style={{ display: 'flex', alignItems: 'center', gap: 8 }}>
<span style={{ color, fontSize: 10 }}>{dot}</span>
<span className="mono" style={{ fontSize: 12, color: theme.text2 }}>{label}</span>
</div>
<span style={{ fontSize: 12, color, fontWeight: 600 }}>
{detail ?? (isOk ? 'OK' : 'ERROR')}
</span>
</div>
);
};
- Step 2: Insert the health section between the stats section and the SYSTEM CONFIGURATION section
Find <section style={{ marginBottom: 48 }}> for SYSTEM CONFIGURATION. Just before it, insert:
<section style={{ marginBottom: 48 }}>
<h2 style={{ fontSize: 14, fontWeight: 600, color: theme.accent, marginBottom: 20, letterSpacing: '1px' }}>
SERVICE HEALTH
</h2>
<div style={{ display: 'grid', gridTemplateColumns: 'repeat(3, 1fr)', gap: 12, marginBottom: 12 }}>
<ServiceBadge
label="MILVUS"
status={health?.milvus.status ?? 'unknown'}
detail={health ? (health.milvus.status === 'ok' ? `${health.milvus.num_entities ?? 0} entities` : 'disconnected') : '—'}
/>
<ServiceBadge
label="MINIO"
status={health?.minio.status ?? 'unknown'}
detail={health ? (health.minio.status === 'ok' ? 'connected' : 'disconnected') : '—'}
/>
<ServiceBadge
label="BM25 HYBRID"
status={health?.bm25.available ?? false}
detail={health ? (health.bm25.available ? 'enabled' : 'unavailable') : '—'}
/>
</div>
<div style={{ display: 'grid', gridTemplateColumns: 'repeat(3, 1fr)', gap: 12 }}>
<ServiceBadge
label="RERANKER"
status={health?.reranker.enabled ?? false}
detail={health ? (health.reranker.enabled ? health.reranker.model ?? 'enabled' : 'disabled') : '—'}
/>
<ServiceBadge
label="SESSIONS"
status="ok"
detail={health ? `${health.sessions.active} / ${health.sessions.max}` : '—'}
/>
<ServiceBadge
label="LLM"
status="ok"
detail={config ? `${config.llm_provider} · ${config.llm_model}` : '—'}
/>
</div>
</section>
- Step 3: Commit
git add frontend/src/pages/Status/StatusPage.tsx
git commit -m "feat(status): add SERVICE HEALTH panel with Milvus/MinIO/BM25/Reranker/Session badges"
Task 5: Frontend — Config Value Truncation Fix + Failed Docs Highlight
Files:
-
Modify:
frontend/src/pages/Status/StatusPage.tsx -
Step 1: Fix config value truncation
In the SYSTEM CONFIGURATION section, find the <span> that renders the config value v — it currently renders as:
<span style={{ fontSize: 14, fontWeight: 500 }}>{v}</span>
Replace it with:
<span
title={v}
style={{
fontSize: 13,
fontWeight: 500,
maxWidth: 240,
overflow: 'hidden',
textOverflow: 'ellipsis',
whiteSpace: 'nowrap',
cursor: 'help',
}}
>
{v}
</span>
This applies to both the MODELS and STORAGE AND PATHS sub-sections since they share the same .map(([k, v]) => ...) render pattern.
- Step 2: Highlight failed documents in the document list
In the DOCUMENT INDEX section, find the docs.map(d => ...) container <div>. Change the border to highlight failed docs:
Old:
border: `1px solid ${theme.border}`,
New:
border: `1px solid ${d.status === 'failed' ? '#d64545' : d.status === 'parsing' || d.status === 'pending' ? theme.accent + '80' : theme.border}`,
- Step 3: Add pulsing indicator for in-progress documents
In the same docs.map block, the status badge currently always shows a static green background. Update it to show different colors per status:
Old:
<div style={{
padding: '4px 12px',
background: d.status === 'failed' ? '#d64545' : theme.green,
borderRadius: 6,
}}>
<span className="mono" style={{ fontSize: 10, fontWeight: 600, color: '#fff' }}>
{d.status.toUpperCase()}
</span>
</div>
New:
<div style={{
padding: '4px 12px',
background:
d.status === 'failed' ? '#d64545' :
d.status === 'parsing' || d.status === 'pending' ? theme.accent :
theme.green,
borderRadius: 6,
opacity: d.status === 'parsing' || d.status === 'pending' ? 0.85 : 1,
}}>
<span className="mono" style={{ fontSize: 10, fontWeight: 600, color: '#fff' }}>
{d.status === 'parsing' ? '⟳ ' : ''}{d.status.toUpperCase()}
</span>
</div>
- Step 4: Commit
git add frontend/src/pages/Status/StatusPage.tsx
git commit -m "fix(status): config value ellipsis truncation, failed doc highlight, parsing doc pulse"
Self-Review Checklist
- Task 1 covers P2 (TTL cache) + P1 (service health backend)
- Task 2 adds the
SystemHealthtype that Tasks 3–4 depend on - Task 3 covers P0 (loading/error/refresh/auto-poll)
- Task 4 covers P1 (BM25/reranker/sessions visibility)
- Task 5 covers P1 (config truncation) + P1 (failed doc visual)
- All
getSystemHealthreferences match the function name defined in Task 2 health?.milvus.statustype matchesServiceHealth.status: 'ok' | 'error' | 'unknown'- Backend
/healthendpoint uses only already-imported bootstrap functions - No TBD or TODO left in plan