Files
AIRegulation-DocAnalysis/docs/superpowers/plans/2026-05-21-system-status-optimizations.md
wangwei dcda7e0423 @
chore: delete old layout/common/tabs components before redesign
@
2026-06-03 16:58:35 +08:00

637 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# System Status Module Optimization Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Upgrade the 系统状态 page from a static one-shot snapshot into a reliable observability dashboard with loading states, service health checks, auto-refresh, BM25/Reranker visibility, and config display fixes.
**Architecture:** Backend adds a unified `/status/health` aggregate endpoint (Milvus + MinIO + BM25 + Reranker + Sessions) and a TTL cache on `/status/stats`. Frontend adds loading/error states, a refresh button, auto-polling while documents are processing, a health services panel, and config value truncation fixes.
**Tech Stack:** FastAPI, Python 3.11, React 18, TypeScript, CSS-in-JS (inline styles with theme context)
---
## File Map
| File | Action | Purpose |
|------|--------|---------|
| `backend/app/api/routes/status.py` | Modify | Add `/health` endpoint, TTL cache on `/stats`, import session store |
| `frontend/src/api/status.ts` | Modify | Add `SystemHealth` type + `getSystemHealth()` |
| `frontend/src/api/index.ts` | Modify | Export `SystemHealth` interface |
| `frontend/src/pages/Status/StatusPage.tsx` | Modify | Loading/error states, refresh, auto-poll, health panel, config fix |
---
## Task 1: Backend — Add `/status/health` Endpoint + TTL Cache on `/status/stats`
**Files:**
- Modify: `backend/app/api/routes/status.py`
- [ ] **Step 1: Read the current `status.py`**
```
File: backend/app/api/routes/status.py
```
Note existing imports and route signatures before editing.
- [ ] **Step 2: Replace `status.py` with the new version**
Replace the entire content of `backend/app/api/routes/status.py` with:
```python
"""Define API routes for status."""
import time
from typing import Any
from fastapi import APIRouter
from app.config.settings import settings
from app.shared.bootstrap import (
get_bm25_retriever,
get_binary_store,
get_conversation_store,
get_document_query_service,
get_vector_index,
)
router = APIRouter(prefix="/status", tags=["系统状态"])
# ---------------------------------------------------------------------------
# Simple TTL cache for /stats (avoids O(N) doc scan on every request)
# ---------------------------------------------------------------------------
_stats_cache: dict[str, Any] = {}
_stats_cache_time: float = 0.0
_STATS_TTL_SECONDS: float = 10.0
def _invalidate_stats_cache() -> None:
global _stats_cache_time
_stats_cache_time = 0.0
@router.get("/stats")
async def get_stats():
"""Return document statistics (cached for 10 s)."""
global _stats_cache, _stats_cache_time
now = time.time()
if _stats_cache and (now - _stats_cache_time) < _STATS_TTL_SECONDS:
return _stats_cache
documents = get_document_query_service().list_documents()
indexed = sum(1 for d in documents if d.status.value == "indexed")
failed = sum(1 for d in documents if d.status.value == "failed")
_stats_cache = {
"documents_total": len(documents),
"documents_indexed": indexed,
"documents_failed": failed,
"chunks_total": sum(d.chunk_count for d in documents),
}
_stats_cache_time = now
return _stats_cache
@router.get("/config")
async def get_config():
"""Return system configuration."""
return {
"embedding_model": settings.embedding_model,
"embedding_dim": settings.embedding_dim,
"embedding_base_url": settings.embedding_base_url,
"milvus_collection": settings.milvus_collection,
"parser_backend": settings.parser_backend,
"chunk_backend": settings.chunk_backend,
"artifact_prefix": settings.document_parse_artifact_prefix,
"parser_failure_mode": settings.parser_failure_mode,
"llm_provider": settings.llm_provider,
"llm_model": settings.llm_model,
"document_metadata_path": settings.document_metadata_path,
}
@router.get("/milvus/health")
async def milvus_health():
"""Return Milvus health (kept for backwards compat)."""
return get_vector_index().health()
@router.get("/health")
async def get_health():
"""Return aggregate health of all backend services."""
# --- Milvus ---
try:
milvus_info = get_vector_index().health()
milvus_status = "ok" if milvus_info.get("connected") else "error"
except Exception as exc: # noqa: BLE001
milvus_info = {}
milvus_status = "error"
milvus_info["error"] = str(exc)
# --- MinIO ---
try:
minio_connected = get_binary_store().client.connected
minio_status = "ok" if minio_connected else "error"
except Exception as exc: # noqa: BLE001
minio_status = "error"
minio_connected = False
# --- BM25 ---
bm25 = get_bm25_retriever()
# --- Sessions ---
try:
session_count = len(get_conversation_store().list_sessions())
except Exception: # noqa: BLE001
session_count = 0
return {
"milvus": {"status": milvus_status, **milvus_info},
"minio": {"status": minio_status, "connected": minio_connected},
"bm25": {"available": bm25 is not None},
"reranker": {
"enabled": settings.reranker_enabled,
"model": settings.reranker_model if settings.reranker_enabled else None,
},
"sessions": {
"active": session_count,
"max": settings.session_max_sessions,
},
}
```
- [ ] **Step 3: Verify Python syntax**
```bash
cd backend && python -c "from app.api.routes.status import router; print('OK')"
```
Expected output: `OK`
- [ ] **Step 4: Commit**
```bash
git add backend/app/api/routes/status.py
git commit -m "feat(status): add /health aggregate endpoint and 10s TTL cache on /stats"
```
---
## Task 2: Frontend API — Add `SystemHealth` Type + `getSystemHealth()`
**Files:**
- Modify: `frontend/src/api/index.ts`
- Modify: `frontend/src/api/status.ts`
- [ ] **Step 1: Add `SystemHealth` interface to `frontend/src/api/index.ts`**
Open `frontend/src/api/index.ts`. After the `SystemConfig` interface (around line 240), add:
```typescript
export interface ServiceHealth {
status: 'ok' | 'error' | 'unknown';
error?: string;
}
export interface SystemHealth {
milvus: ServiceHealth & {
connected?: boolean;
collection_name?: string;
num_entities?: number;
};
minio: ServiceHealth & { connected?: boolean };
bm25: { available: boolean };
reranker: { enabled: boolean; model?: string | null };
sessions: { active: number; max: number };
}
```
- [ ] **Step 2: Add `getSystemHealth()` to `frontend/src/api/status.ts`**
Open `frontend/src/api/status.ts`. The current content is:
```typescript
import { fetchAPI, type SystemConfig, type SystemStats } from './index';
export async function getSystemStats(): Promise<SystemStats> {
return fetchAPI<SystemStats>('/status/stats');
}
export async function getSystemConfig(): Promise<SystemConfig> {
return fetchAPI<SystemConfig>('/status/config');
}
export async function getMilvusHealth(): Promise<{ connected: boolean; collections: string[] }> {
return fetchAPI('/status/milvus/health');
}
export type { SystemConfig, SystemStats };
```
Replace with:
```typescript
import { fetchAPI, type SystemConfig, type SystemHealth, type SystemStats } from './index';
export async function getSystemStats(): Promise<SystemStats> {
return fetchAPI<SystemStats>('/status/stats');
}
export async function getSystemConfig(): Promise<SystemConfig> {
return fetchAPI<SystemConfig>('/status/config');
}
export async function getSystemHealth(): Promise<SystemHealth> {
return fetchAPI<SystemHealth>('/status/health');
}
export type { SystemConfig, SystemHealth, SystemStats };
```
- [ ] **Step 3: Commit**
```bash
git add frontend/src/api/index.ts frontend/src/api/status.ts
git commit -m "feat(status): add SystemHealth type and getSystemHealth() API function"
```
---
## Task 3: Frontend — Loading/Error States + Refresh Button + Auto-Poll
**Files:**
- Modify: `frontend/src/pages/Status/StatusPage.tsx`
This task replaces the `useEffect` + `loadData` pattern and the top of the component with loading/error/refresh support. Auto-poll fires every 5 s while any document is `parsing` or `pending`.
- [ ] **Step 1: Read current `StatusPage.tsx`**
```
File: frontend/src/pages/Status/StatusPage.tsx
```
Identify: the `useState` block, `loadData` function, and the single `useEffect`.
- [ ] **Step 2: Replace the import block and state/effect section**
Find and replace the imports at the top of the file:
**Old:**
```typescript
import React, { useEffect, useState } from 'react';
```
**New:**
```typescript
import React, { useCallback, useEffect, useState } from 'react';
```
- [ ] **Step 3: Add `SystemHealth` to the API import**
**Old:**
```typescript
import { getSystemStats, getSystemConfig, type SystemStats, type SystemConfig } from '../../api/status';
```
**New:**
```typescript
import { getSystemStats, getSystemConfig, getSystemHealth, type SystemStats, type SystemConfig, type SystemHealth } from '../../api/status';
```
- [ ] **Step 4: Replace the state declarations and loadData + useEffect block inside the component**
Find the block starting with `const [stats, setStats]` and ending after the closing `}, []);` of the first `useEffect`. Replace it entirely with:
```typescript
const [stats, setStats] = useState<SystemStats>({
documents_total: 0,
documents_indexed: 0,
documents_failed: 0,
chunks_total: 0,
});
const [config, setConfig] = useState<SystemConfig | null>(null);
const [docs, setDocs] = useState<DocInfo[]>([]);
const [health, setHealth] = useState<SystemHealth | null>(null);
const [loading, setLoading] = useState(true);
const [error, setError] = useState<string | null>(null);
const loadData = useCallback(async () => {
setLoading(true);
setError(null);
try {
const [statsRes, configRes, docsRes, healthRes] = await Promise.all([
getSystemStats(),
getSystemConfig(),
getDocumentList(),
getSystemHealth(),
]);
setStats(statsRes);
setConfig(configRes);
setDocs(docsRes.docs);
setHealth(healthRes);
} catch (err) {
setError(err instanceof Error ? err.message : 'Failed to load status data');
} finally {
setLoading(false);
}
}, []);
// Initial load
useEffect(() => {
void loadData();
}, [loadData]);
// Auto-poll every 5 s while any document is still processing
useEffect(() => {
const hasProcessing = docs.some(d => d.status === 'parsing' || d.status === 'pending');
if (!hasProcessing) return;
const id = window.setInterval(() => void loadData(), 5000);
return () => window.clearInterval(id);
}, [docs, loadData]);
```
- [ ] **Step 5: Add loading banner and error banner at the top of the returned JSX**
Find the `return (` statement in the component. Inside `<Content>`, right after `<TPattern />`, add:
```typescript
{/* Loading overlay */}
{loading && (
<div style={{ display: 'flex', alignItems: 'center', gap: 8, marginBottom: 16, color: theme.text3, fontSize: 13 }}>
<span style={{ display: 'inline-block', width: 12, height: 12, borderRadius: '50%', border: `2px solid ${theme.accent}`, borderTopColor: 'transparent', animation: 'spin 0.8s linear infinite' }} />
<span className="mono">LOADING...</span>
</div>
)}
{/* Error banner */}
{error && (
<div style={{
marginBottom: 16,
padding: '12px 16px',
background: '#d6454520',
border: '1px solid #d64545',
borderRadius: 8,
display: 'flex',
alignItems: 'center',
justifyContent: 'space-between',
}}>
<span style={{ fontSize: 13, color: '#d64545' }}>{error}</span>
<button
onClick={() => void loadData()}
style={{ background: 'none', border: '1px solid #d64545', borderRadius: 6, color: '#d64545', cursor: 'pointer', padding: '4px 10px', fontSize: 12 }}
>
重试
</button>
</div>
)}
```
- [ ] **Step 6: Add a refresh button to the stats section header**
Find the `<section style={{ marginBottom: 48 }}>` that wraps the 4 stats cards. Add a header row with a refresh button just before the grid `<div>`:
```typescript
<section style={{ marginBottom: 48 }}>
<div style={{ display: 'flex', alignItems: 'center', justifyContent: 'space-between', marginBottom: 16 }}>
<h2 style={{ fontSize: 14, fontWeight: 600, color: theme.accent, letterSpacing: '1px', margin: 0 }}>
DOCUMENT STATISTICS
</h2>
<button
onClick={() => void loadData()}
disabled={loading}
style={{
background: 'none',
border: `1px solid ${theme.border}`,
borderRadius: 6,
color: theme.text3,
cursor: loading ? 'not-allowed' : 'pointer',
padding: '4px 12px',
fontSize: 11,
opacity: loading ? 0.5 : 1,
}}
>
刷新
</button>
</div>
{/* existing 4-card grid stays here unchanged */}
```
- [ ] **Step 7: Add CSS keyframe for the spinner**
Find the `<Content>` wrapping element. Add a `<style>` tag as the very first child inside `<Content>`:
```typescript
<style>{`@keyframes spin { to { transform: rotate(360deg); } }`}</style>
```
- [ ] **Step 8: Commit**
```bash
git add frontend/src/pages/Status/StatusPage.tsx
git commit -m "feat(status): add loading/error states, refresh button, auto-poll for processing docs"
```
---
## Task 4: Frontend — Service Health Panel
**Files:**
- Modify: `frontend/src/pages/Status/StatusPage.tsx`
Adds a new section after the stats cards that shows a colored status badge for each service (Milvus, MinIO, BM25, Reranker) plus the active session count.
- [ ] **Step 1: Add the `ServiceBadge` helper component** (add just before `StatsCard`)
```typescript
const ServiceBadge = ({
label,
status,
detail,
}: {
label: string;
status: 'ok' | 'error' | 'unknown' | boolean;
detail?: string;
}) => {
const { theme } = useTheme();
const isOk = status === 'ok' || status === true;
const color = isOk ? theme.green : '#d64545';
const dot = isOk ? '●' : '●';
return (
<div style={{
display: 'flex',
alignItems: 'center',
justifyContent: 'space-between',
padding: '12px 16px',
background: theme.bgCard,
borderRadius: 10,
border: `1px solid ${theme.border}`,
}}>
<div style={{ display: 'flex', alignItems: 'center', gap: 8 }}>
<span style={{ color, fontSize: 10 }}>{dot}</span>
<span className="mono" style={{ fontSize: 12, color: theme.text2 }}>{label}</span>
</div>
<span style={{ fontSize: 12, color, fontWeight: 600 }}>
{detail ?? (isOk ? 'OK' : 'ERROR')}
</span>
</div>
);
};
```
- [ ] **Step 2: Insert the health section between the stats section and the SYSTEM CONFIGURATION section**
Find `<section style={{ marginBottom: 48 }}>` for SYSTEM CONFIGURATION. Just before it, insert:
```typescript
<section style={{ marginBottom: 48 }}>
<h2 style={{ fontSize: 14, fontWeight: 600, color: theme.accent, marginBottom: 20, letterSpacing: '1px' }}>
SERVICE HEALTH
</h2>
<div style={{ display: 'grid', gridTemplateColumns: 'repeat(3, 1fr)', gap: 12, marginBottom: 12 }}>
<ServiceBadge
label="MILVUS"
status={health?.milvus.status ?? 'unknown'}
detail={health ? (health.milvus.status === 'ok' ? `${health.milvus.num_entities ?? 0} entities` : 'disconnected') : '—'}
/>
<ServiceBadge
label="MINIO"
status={health?.minio.status ?? 'unknown'}
detail={health ? (health.minio.status === 'ok' ? 'connected' : 'disconnected') : '—'}
/>
<ServiceBadge
label="BM25 HYBRID"
status={health?.bm25.available ?? false}
detail={health ? (health.bm25.available ? 'enabled' : 'unavailable') : '—'}
/>
</div>
<div style={{ display: 'grid', gridTemplateColumns: 'repeat(3, 1fr)', gap: 12 }}>
<ServiceBadge
label="RERANKER"
status={health?.reranker.enabled ?? false}
detail={health ? (health.reranker.enabled ? health.reranker.model ?? 'enabled' : 'disabled') : '—'}
/>
<ServiceBadge
label="SESSIONS"
status="ok"
detail={health ? `${health.sessions.active} / ${health.sessions.max}` : '—'}
/>
<ServiceBadge
label="LLM"
status="ok"
detail={config ? `${config.llm_provider} · ${config.llm_model}` : '—'}
/>
</div>
</section>
```
- [ ] **Step 3: Commit**
```bash
git add frontend/src/pages/Status/StatusPage.tsx
git commit -m "feat(status): add SERVICE HEALTH panel with Milvus/MinIO/BM25/Reranker/Session badges"
```
---
## Task 5: Frontend — Config Value Truncation Fix + Failed Docs Highlight
**Files:**
- Modify: `frontend/src/pages/Status/StatusPage.tsx`
- [ ] **Step 1: Fix config value truncation**
In the SYSTEM CONFIGURATION section, find the `<span>` that renders the config value `v` — it currently renders as:
```typescript
<span style={{ fontSize: 14, fontWeight: 500 }}>{v}</span>
```
Replace it with:
```typescript
<span
title={v}
style={{
fontSize: 13,
fontWeight: 500,
maxWidth: 240,
overflow: 'hidden',
textOverflow: 'ellipsis',
whiteSpace: 'nowrap',
cursor: 'help',
}}
>
{v}
</span>
```
This applies to both the MODELS and STORAGE AND PATHS sub-sections since they share the same `.map(([k, v]) => ...)` render pattern.
- [ ] **Step 2: Highlight failed documents in the document list**
In the DOCUMENT INDEX section, find the `docs.map(d => ...)` container `<div>`. Change the border to highlight failed docs:
**Old:**
```typescript
border: `1px solid ${theme.border}`,
```
**New:**
```typescript
border: `1px solid ${d.status === 'failed' ? '#d64545' : d.status === 'parsing' || d.status === 'pending' ? theme.accent + '80' : theme.border}`,
```
- [ ] **Step 3: Add pulsing indicator for in-progress documents**
In the same `docs.map` block, the status badge currently always shows a static green background. Update it to show different colors per status:
**Old:**
```typescript
<div style={{
padding: '4px 12px',
background: d.status === 'failed' ? '#d64545' : theme.green,
borderRadius: 6,
}}>
<span className="mono" style={{ fontSize: 10, fontWeight: 600, color: '#fff' }}>
{d.status.toUpperCase()}
</span>
</div>
```
**New:**
```typescript
<div style={{
padding: '4px 12px',
background:
d.status === 'failed' ? '#d64545' :
d.status === 'parsing' || d.status === 'pending' ? theme.accent :
theme.green,
borderRadius: 6,
opacity: d.status === 'parsing' || d.status === 'pending' ? 0.85 : 1,
}}>
<span className="mono" style={{ fontSize: 10, fontWeight: 600, color: '#fff' }}>
{d.status === 'parsing' ? '⟳ ' : ''}{d.status.toUpperCase()}
</span>
</div>
```
- [ ] **Step 4: Commit**
```bash
git add frontend/src/pages/Status/StatusPage.tsx
git commit -m "fix(status): config value ellipsis truncation, failed doc highlight, parsing doc pulse"
```
---
## Self-Review Checklist
- [x] Task 1 covers P2 (TTL cache) + P1 (service health backend)
- [x] Task 2 adds the `SystemHealth` type that Tasks 34 depend on
- [x] Task 3 covers P0 (loading/error/refresh/auto-poll)
- [x] Task 4 covers P1 (BM25/reranker/sessions visibility)
- [x] Task 5 covers P1 (config truncation) + P1 (failed doc visual)
- [x] All `getSystemHealth` references match the function name defined in Task 2
- [x] `health?.milvus.status` type matches `ServiceHealth.status: 'ok' | 'error' | 'unknown'`
- [x] Backend `/health` endpoint uses only already-imported bootstrap functions
- [x] No TBD or TODO left in plan