Refactor document handling and update Milvus collection settings

- Removed multiple failed document entries from `documents.json`.
- Added a new document entry with updated metadata and changed the index name to `regulations_dense_1024_v2`.
- Updated architecture documentation to reflect changes in the Milvus collection name.
- Adjusted requirements by removing the sqlalchemy dependency.
- Modified test cases to align with new document structure and naming conventions.
- Introduced a new test file for Milvus vector index runtime recovery and error handling.
- Updated assertions in various test files to ensure compatibility with the new schema.
This commit is contained in:
ash66
2026-05-26 20:21:31 +08:00
parent fec22a3a2c
commit 30c7bda389
42 changed files with 7482 additions and 569 deletions

View File

@@ -8,7 +8,7 @@
- ✅ PDF/DOC/DOCX 文档解析(阿里云文档智能)
- ✅ 基于阿里云 `vector_chunks` 的统一切片
- ✅ OpenAI 兼容 embedding`text-embedding-v3`1536维)
- ✅ OpenAI 兼容 embedding`text-embedding-v3`1024维)
- ✅ Milvus 向量数据库存储与 dense-only 检索
- ✅ FastAPI接口封装
@@ -97,7 +97,7 @@ curl -X POST http://localhost:8000/api/v1/knowledge/search \
|------|------|
| 文档解析 | 阿里云文档智能 + python-docx |
| 分块策略 | 阿里云 `vector_chunks` |
| 嵌入模型 | `text-embedding-v3`1536维 Dense |
| 嵌入模型 | `text-embedding-v3`1024维 Dense |
| 向量数据库 | Milvus 2.4本地Docker部署 |
| 检索方式 | Dense-only 检索 |
| API框架 | FastAPI |
@@ -119,7 +119,7 @@ CHUNK_BACKEND=aliyun
# embedding 配置
EMBEDDING_MODEL=text-embedding-v3
EMBEDDING_DIM=1536
EMBEDDING_DIM=1024
EMBEDDING_API_KEY=your_embedding_api_key_here
# 分块配置
@@ -142,7 +142,7 @@ CHUNK_SIZE=512
- `artifacts/{doc_id}/semantic_blocks.json`
- `artifacts/{doc_id}/vector_chunks.json`
当前默认 Milvus collection 为 `regulations_dense_1536_v2`
当前默认 Milvus collection 为 `regulations_dense_1024_v2`
## 许可证