first commit
This commit is contained in:
536
03_业务闭环说明.md
Normal file
536
03_业务闭环说明.md
Normal file
@@ -0,0 +1,536 @@
|
||||
# AI合规智能中枢 — 三条业务闭环说明
|
||||
|
||||
> 本文档详细描述三条核心业务闭环的数据流、接口规范和验证方法。
|
||||
|
||||
---
|
||||
|
||||
## 一、闭环①:法规入库 → 检索问答
|
||||
|
||||
### 1.1 业务场景
|
||||
|
||||
**触发场景:**
|
||||
- 法务/研发人员上传新法规 PDF(如 GB 18384-2020、UN-ECE R155)
|
||||
- 系统自动解析、分块、向量化,建立可检索知识库
|
||||
- 用户用自然语言提问,系统返回精准答案并标注来源
|
||||
|
||||
**用户角色:** 车企研发、法务、合规管理员
|
||||
|
||||
### 1.2 数据流
|
||||
|
||||
```
|
||||
[用户] 上传 PDF
|
||||
│
|
||||
▼
|
||||
POST /api/kb/files/upload
|
||||
{workspace_id, file}
|
||||
│
|
||||
▼
|
||||
[kbmp-service]
|
||||
- 存储文件 → data/uploads/{file_id}.pdf
|
||||
- 写入 files 表(status: uploaded)
|
||||
- 投递 Celery 任务 → parse-queue
|
||||
- 返回 {task_id, file_id}
|
||||
│
|
||||
▼ 异步
|
||||
[celery: parse-worker]
|
||||
- 调用 POST http://mcp-server:8011/mineru-parse
|
||||
- 获取 Markdown 文本
|
||||
- 更新 files 表(status: parsed)
|
||||
- 投递 vectorize-queue
|
||||
│
|
||||
▼ 异步
|
||||
[celery: vectorize-worker]
|
||||
- 文本分块(chunk_size=512,overlap=64)
|
||||
- 调用 POST http://embedding-service:8010/embed
|
||||
- 获取 1024维 Dense + Sparse 向量
|
||||
- 写入 Milvus regulation_chunks
|
||||
- 写入 PostgreSQL(chunk 元数据)
|
||||
- 更新 files 表(status: vectorized)
|
||||
- 更新 tasks 表(status: completed)
|
||||
|
||||
[用户] 提问
|
||||
│
|
||||
▼
|
||||
POST /api/kb/qa
|
||||
{query, workspace_id, top_k=5}
|
||||
│
|
||||
▼
|
||||
[rag-service]
|
||||
1. BGE-M3 向量化查询
|
||||
2. Milvus Dense 向量检索(Cosine,top-20)
|
||||
3. Milvus Sparse 向量检索(BM25 等效,top-20)
|
||||
4. RRF 融合(Reciprocal Rank Fusion)
|
||||
5. Cross-Encoder Reranker 精排(top-5)
|
||||
6. 构建 RAG Prompt(含检索片段)
|
||||
7. DeepSeek API 生成答案(引文锚定)
|
||||
│
|
||||
▼
|
||||
返回:{answer, sources: [{content, file, page, score}], tokens_used}
|
||||
```
|
||||
|
||||
### 1.3 关键接口
|
||||
|
||||
```http
|
||||
### 创建工作空间
|
||||
POST /api/kb/workspaces
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "汽车安全法规库",
|
||||
"description": "GB、UN-ECE 系列法规",
|
||||
"domain": "vehicle_safety"
|
||||
}
|
||||
|
||||
### 响应
|
||||
{
|
||||
"id": "uuid-xxx",
|
||||
"name": "汽车安全法规库",
|
||||
"created_at": "2026-04-22T10:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
```http
|
||||
### 上传文件
|
||||
POST /api/kb/files/upload
|
||||
Content-Type: multipart/form-data
|
||||
|
||||
file: <binary>
|
||||
workspace_id: uuid-xxx
|
||||
|
||||
### 响应
|
||||
{
|
||||
"file_id": "uuid-yyy",
|
||||
"task_id": "uuid-zzz",
|
||||
"filename": "GB18384-2020.pdf",
|
||||
"status": "processing"
|
||||
}
|
||||
```
|
||||
|
||||
```http
|
||||
### 查询任务状态
|
||||
GET /api/kb/tasks/{task_id}
|
||||
|
||||
### 响应
|
||||
{
|
||||
"task_id": "uuid-zzz",
|
||||
"status": "completed", // pending / running / completed / failed
|
||||
"progress": 100,
|
||||
"file_id": "uuid-yyy",
|
||||
"completed_at": "2026-04-22T10:05:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
```http
|
||||
### 智能问答
|
||||
POST /api/kb/qa
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"query": "电动汽车碰撞后高压系统的断电时间要求是多少?",
|
||||
"workspace_id": "uuid-xxx",
|
||||
"top_k": 5,
|
||||
"return_sources": true
|
||||
}
|
||||
|
||||
### 响应
|
||||
{
|
||||
"answer": "根据 GB 18384-2020 第 2.2 条,碰撞后 5 秒内,高压系统电压应降至 60V 以下。[来源:GB18384-2020.pdf,第3页]",
|
||||
"sources": [
|
||||
{
|
||||
"content": "碰撞后5秒内,高压系统电压应降至60V以下。",
|
||||
"file": "GB18384-2020.pdf",
|
||||
"page": 3,
|
||||
"chunk_idx": 12,
|
||||
"score": 0.94
|
||||
}
|
||||
],
|
||||
"tokens_used": 1250
|
||||
}
|
||||
```
|
||||
|
||||
### 1.4 分块策略
|
||||
|
||||
```python
|
||||
# 推荐分块配置(调研阶段)
|
||||
CHUNK_SIZE = 512 # 每块最大 token 数
|
||||
CHUNK_OVERLAP = 64 # 块间重叠(保留上下文)
|
||||
SEPARATOR = "\n\n" # 优先按段落分割
|
||||
|
||||
# 法规文档特殊处理
|
||||
# - 识别条款编号(1.1, 2.3.1 等),保证条款完整性
|
||||
# - 表格单独处理(不与正文混合)
|
||||
# - 图片提取 alt text
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 二、闭环②:文档上传 → 合规审查
|
||||
|
||||
### 2.1 业务场景
|
||||
|
||||
**触发场景:**
|
||||
- 采购/供应链人员上传供应商文件(技术规格书、合规声明等)
|
||||
- 研发人员上传设计文档,检查是否符合最新法规
|
||||
- EHS 工程师上传安全操作规程,验证 ISO 45001 合规性
|
||||
|
||||
**用户角色:** 采购、供应链、研发、EHS 工程师
|
||||
|
||||
### 2.2 数据流
|
||||
|
||||
```
|
||||
[用户] 上传供应商文件
|
||||
│
|
||||
▼
|
||||
POST /api/compliance/upload
|
||||
{file, regulation_domains}
|
||||
│
|
||||
▼
|
||||
[compliance-backend]
|
||||
- MinerU 解析文档
|
||||
- 条款级分割(识别条款结构)
|
||||
- 法规域匹配(根据内容自动识别:vehicle_safety / data_security / ehs)
|
||||
- 投递 compliance-queue
|
||||
│
|
||||
▼ 异步
|
||||
[celery: compliance-worker]
|
||||
1. 对每个条款,在 Milvus 中检索相关法规要求
|
||||
2. DeepSeek API 评估合规性
|
||||
Prompt: "对比以下供应商条款与法规要求,评估合规性..."
|
||||
3. 生成风险评分(0-100)
|
||||
4. 汇总生成 Markdown 报告
|
||||
5. 存储 compliance_reports 表
|
||||
│
|
||||
▼
|
||||
[用户] 获取报告
|
||||
GET /api/compliance/report/{id}
|
||||
```
|
||||
|
||||
### 2.3 关键接口
|
||||
|
||||
```http
|
||||
### 上传并审查文档
|
||||
POST /api/compliance/upload
|
||||
Content-Type: multipart/form-data
|
||||
|
||||
file: <binary>
|
||||
regulation_domains: ["vehicle_safety", "data_security"] # 可多选
|
||||
|
||||
### 响应
|
||||
{
|
||||
"report_id": "uuid-aaa",
|
||||
"file_id": "uuid-bbb",
|
||||
"status": "analyzing",
|
||||
"estimated_time_seconds": 60
|
||||
}
|
||||
```
|
||||
|
||||
```http
|
||||
### 直接合规检查(文本输入)
|
||||
POST /api/compliance/check
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"query": "供应商声明:产品绝缘电阻为50Ω/V,满足行业标准",
|
||||
"regulation_domains": ["vehicle_safety"],
|
||||
"top_k": 3
|
||||
}
|
||||
|
||||
### 响应
|
||||
{
|
||||
"risk_level": "high",
|
||||
"risk_score": 78,
|
||||
"findings": [
|
||||
{
|
||||
"clause": "GB 18384-2020 第2.1条",
|
||||
"requirement": "直流电路绝缘电阻不得低于100Ω/V",
|
||||
"actual": "供应商声明50Ω/V",
|
||||
"gap": "不满足,差距50Ω/V",
|
||||
"severity": "critical"
|
||||
}
|
||||
],
|
||||
"recommendations": [
|
||||
"要求供应商提升绝缘电阻至100Ω/V以上",
|
||||
"提供经第三方认证的测试报告"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
```http
|
||||
### 获取完整审查报告
|
||||
GET /api/compliance/report/{report_id}
|
||||
|
||||
### 响应
|
||||
{
|
||||
"report_id": "uuid-aaa",
|
||||
"overall_risk_level": "high",
|
||||
"risk_score": 78,
|
||||
"findings": [...],
|
||||
"recommendations": [...],
|
||||
"report_markdown": "# 合规审查报告\n\n## 总体评估\n...",
|
||||
"regulation_domains": ["vehicle_safety"],
|
||||
"llm_model": "deepseek-chat",
|
||||
"created_at": "2026-04-22T11:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### 2.4 风险等级定义
|
||||
|
||||
| 风险等级 | 分数 | 说明 | 建议行动 |
|
||||
|---------|------|------|---------|
|
||||
| low | 0-30 | 基本合规,小幅优化 | 记录并监控 |
|
||||
| medium | 31-60 | 部分不符合,需要整改 | 制定整改计划 |
|
||||
| high | 61-80 | 重大不符合,需立即处理 | 暂停合作/紧急整改 |
|
||||
| critical | 81-100 | 严重违规,可能造成法律风险 | 立即停止/上报管理层 |
|
||||
|
||||
---
|
||||
|
||||
## 三、闭环③:法规监控 → 变更推送
|
||||
|
||||
### 3.1 业务场景
|
||||
|
||||
**触发场景:**
|
||||
- 国家发布新的新能源汽车数据安全法规
|
||||
- 现有法规(如 GB 7258)进行修订
|
||||
- 碳排放法规新增企业义务
|
||||
|
||||
系统自动检测变更,分析影响,推送给相关角色。
|
||||
|
||||
**用户角色:** 合规管理员、法务专员、EHS 工程师(订阅对应域)
|
||||
|
||||
### 3.2 数据流
|
||||
|
||||
```
|
||||
[Celery Beat] 每天凌晨 2:00 触发
|
||||
│
|
||||
▼
|
||||
[celery: monitor-worker]
|
||||
- 读取 regulation_sources 表(所有 is_active=True 的监控源)
|
||||
- 对每个监控源:
|
||||
a. HTTP 抓取页面内容
|
||||
b. 计算 MD5 Hash
|
||||
c. 与 last_hash 对比
|
||||
d. 有变化 → 投递变更分析任务
|
||||
│
|
||||
▼ [有变更时]
|
||||
[celery: compliance-worker]
|
||||
- DeepSeek API 分析变更内容
|
||||
- 提取新增/修订/废止条款
|
||||
- 生成变更摘要
|
||||
- 写入 regulation_updates 表
|
||||
- 触发增量入库(重新向量化变更条款)
|
||||
- 更新 Neo4j 知识图谱
|
||||
│
|
||||
▼
|
||||
[celery: push-worker]
|
||||
- 读取 subscriptions 表
|
||||
- 按域、重要性过滤
|
||||
- 发送推送(Email / Webhook / 飞书)
|
||||
- 标记 is_notified=True
|
||||
```
|
||||
|
||||
### 3.3 关键接口
|
||||
|
||||
```http
|
||||
### 配置监控源
|
||||
POST /api/regulation/sources
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "国家标准全文公开系统",
|
||||
"url": "https://std.samr.gov.cn",
|
||||
"domain": "vehicle_safety",
|
||||
"fetch_interval": 86400,
|
||||
"fetch_config": {
|
||||
"css_selector": ".standard-list .item",
|
||||
"title_selector": ".title",
|
||||
"date_selector": ".date"
|
||||
}
|
||||
}
|
||||
|
||||
### 响应
|
||||
{
|
||||
"id": "uuid-src1",
|
||||
"name": "国家标准全文公开系统",
|
||||
"status": "active",
|
||||
"next_fetch_at": "2026-04-23T02:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
```http
|
||||
### 查看法规变更记录
|
||||
GET /api/regulation/updates?domain=vehicle_safety&limit=10&offset=0
|
||||
|
||||
### 响应
|
||||
{
|
||||
"total": 25,
|
||||
"updates": [
|
||||
{
|
||||
"id": "uuid-upd1",
|
||||
"title": "GB 18384-2022 电动汽车安全要求(修订版)",
|
||||
"url": "https://std.samr.gov.cn/xxxx",
|
||||
"change_type": "revised",
|
||||
"summary": "主要变更:碰撞断电时间由5秒缩短至3秒;新增涉水安全要求",
|
||||
"importance": "high",
|
||||
"fetched_at": "2026-04-22T02:00:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
```http
|
||||
### 手动触发法规源采集(测试用)
|
||||
POST /api/regulation/sources/{source_id}/fetch
|
||||
|
||||
### 响应
|
||||
{
|
||||
"task_id": "uuid-task1",
|
||||
"status": "queued",
|
||||
"source_id": "uuid-src1"
|
||||
}
|
||||
```
|
||||
|
||||
```http
|
||||
### 订阅变更推送
|
||||
POST /api/regulation/subscribe
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"name": "EHS 工程师推送",
|
||||
"channel": "webhook",
|
||||
"target": "https://open.feishu.cn/open-apis/bot/v2/hook/xxxx",
|
||||
"domains": ["ehs", "carbon"],
|
||||
"importance_min": "normal"
|
||||
}
|
||||
```
|
||||
|
||||
### 3.4 内置监控源列表
|
||||
|
||||
| 名称 | URL | 域 |
|
||||
|------|-----|-----|
|
||||
| 国家标准全文公开系统 | https://std.samr.gov.cn | vehicle_safety |
|
||||
| 工信部政策法规 | https://www.miit.gov.cn/jgsj/fgs/zcfg | vehicle_safety |
|
||||
| 应急管理部法规 | https://www.mem.gov.cn/gk/zcfg | ehs |
|
||||
| 生态环境部法规 | https://www.mee.gov.cn/ywgz/fgbz/fl | carbon |
|
||||
| 网信办法规 | https://www.cac.gov.cn/zcfg/index.htm | data_security |
|
||||
|
||||
---
|
||||
|
||||
## 四、接口认证说明(调研版)
|
||||
|
||||
调研版使用简单 API Key 认证(在 `Authorization` 头传入):
|
||||
|
||||
```http
|
||||
# 所有请求需要携带 API Key
|
||||
Authorization: Bearer <API_SECRET_KEY>
|
||||
```
|
||||
|
||||
> `API_SECRET_KEY` 在 `.env` 中配置,默认值仅供本地调研使用,生产环境必须更换。
|
||||
|
||||
---
|
||||
|
||||
## 五、完整冒烟测试脚本
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
# 完整三条闭环验证
|
||||
API="http://localhost"
|
||||
KEY="your_api_secret_key"
|
||||
HEADER="-H 'Authorization: Bearer $KEY' -H 'Content-Type: application/json'"
|
||||
|
||||
# ── 闭环①测试 ────────────────────────────────
|
||||
echo "=== 测试闭环①:法规入库 → 问答 ==="
|
||||
|
||||
# 1. 创建工作空间
|
||||
WS=$(curl -sf -X POST $API/api/kb/workspaces \
|
||||
-H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
|
||||
-d '{"name":"测试法规库","domain":"vehicle_safety"}')
|
||||
WS_ID=$(echo $WS | python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
|
||||
echo "工作空间:$WS_ID"
|
||||
|
||||
# 2. 上传测试 PDF
|
||||
UPLOAD=$(curl -sf -X POST $API/api/kb/files/upload \
|
||||
-H "Authorization: Bearer $KEY" \
|
||||
-F "file=@data/uploads/test_regulation.txt" \
|
||||
-F "workspace_id=$WS_ID")
|
||||
TASK_ID=$(echo $UPLOAD | python3 -c "import sys,json; print(json.load(sys.stdin)['task_id'])")
|
||||
echo "任务ID:$TASK_ID"
|
||||
|
||||
# 3. 等待处理
|
||||
for i in {1..30}; do
|
||||
STATUS=$(curl -sf $API/api/kb/tasks/$TASK_ID -H "Authorization: Bearer $KEY" | \
|
||||
python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
|
||||
[[ "$STATUS" == "completed" ]] && echo "处理完成" && break
|
||||
sleep 5
|
||||
done
|
||||
|
||||
# 4. 问答测试
|
||||
QA=$(curl -sf -X POST $API/api/kb/qa \
|
||||
-H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
|
||||
-d "{\"query\":\"碰撞后高压系统要求\",\"workspace_id\":\"$WS_ID\"}")
|
||||
echo "问答结果:$(echo $QA | python3 -c "import sys,json; print(json.load(sys.stdin).get('answer','')[:100])")"
|
||||
|
||||
# ── 闭环②测试 ────────────────────────────────
|
||||
echo ""
|
||||
echo "=== 测试闭环②:合规审查 ==="
|
||||
CHECK=$(curl -sf -X POST $API/api/compliance/check \
|
||||
-H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
|
||||
-d '{"query":"绝缘电阻50Ω/V","regulation_domains":["vehicle_safety"]}')
|
||||
echo "风险等级:$(echo $CHECK | python3 -c "import sys,json; print(json.load(sys.stdin).get('risk_level','unknown'))")"
|
||||
|
||||
# ── 闭环③测试 ────────────────────────────────
|
||||
echo ""
|
||||
echo "=== 测试闭环③:法规监控 ==="
|
||||
SRC=$(curl -sf -X POST $API/api/regulation/sources \
|
||||
-H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
|
||||
-d '{"name":"测试源","url":"https://std.samr.gov.cn","domain":"vehicle_safety"}')
|
||||
echo "监控源:$(echo $SRC | python3 -c "import sys,json; print(json.load(sys.stdin).get('id','failed'))")"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 六、数据流示意图(完整版)
|
||||
|
||||
```
|
||||
┌─────────────────────────────────┐
|
||||
│ 用户请求 │
|
||||
│ Web / API / Mobile / Bot │
|
||||
└──────────────┬──────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────┐
|
||||
│ Nginx API Gateway │
|
||||
│ 路由 / 限流 / 认证 │
|
||||
└──────────────┬──────────────────┘
|
||||
│
|
||||
┌────────────────────┼────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────────┐ ┌──────────────────┐ ┌────────────────┐
|
||||
│ 知识库 │ │ 合规审查 │ │ 法规监控 │
|
||||
│ /api/kb/* │ │ /api/compliance/* │ │/api/regulation/│
|
||||
└──────┬───────┘ └────────┬─────────┘ └───────┬────────┘
|
||||
│ │ │
|
||||
└──────────┬──────────┘ │
|
||||
│ │
|
||||
▼ ▼
|
||||
┌──────────────────┐ ┌──────────────────┐
|
||||
│ compliance- │ │ Celery Beat │
|
||||
│ backend │ │ 定时调度 │
|
||||
└──────┬───────────┘ └────────┬─────────┘
|
||||
│ │
|
||||
┌──────────┼──────────┐ ┌──────────┼──────────┐
|
||||
│ │ │ │ │ │
|
||||
▼ ▼ ▼ ▼ ▼ ▼
|
||||
parse-w vectorize-w compliance-w monitor-w push-w
|
||||
│ │ │ │ │
|
||||
▼ ▼ │ │ ▼
|
||||
mcp-server embedding LLM API 网络抓取 通知推送
|
||||
(MinerU) (BGE-M3) (DeepSeek) (requests) (Email/Bot)
|
||||
│ │
|
||||
└────┬─────┘
|
||||
│
|
||||
┌──────────┼──────────────┐
|
||||
▼ ▼ ▼
|
||||
PostgreSQL Milvus Neo4j
|
||||
(元数据/报告) (向量检索) (知识图谱)
|
||||
```
|
||||
Reference in New Issue
Block a user