Refactor document handling and update Milvus collection settings
- Removed multiple failed document entries from `documents.json`. - Added a new document entry with updated metadata and changed the index name to `regulations_dense_1024_v2`. - Updated architecture documentation to reflect changes in the Milvus collection name. - Adjusted requirements by removing the sqlalchemy dependency. - Modified test cases to align with new document structure and naming conventions. - Introduced a new test file for Milvus vector index runtime recovery and error handling. - Updated assertions in various test files to ensure compatibility with the new schema.
This commit is contained in:
@@ -8,7 +8,7 @@
|
||||
|
||||
- ✅ PDF/DOC/DOCX 文档解析(阿里云文档智能)
|
||||
- ✅ 基于阿里云 `vector_chunks` 的统一切片
|
||||
- ✅ OpenAI 兼容 embedding(`text-embedding-v3`,1536维)
|
||||
- ✅ OpenAI 兼容 embedding(`text-embedding-v3`,1024维)
|
||||
- ✅ Milvus 向量数据库存储与 dense-only 检索
|
||||
- ✅ FastAPI接口封装
|
||||
|
||||
@@ -97,7 +97,7 @@ curl -X POST http://localhost:8000/api/v1/knowledge/search \
|
||||
|------|------|
|
||||
| 文档解析 | 阿里云文档智能 + python-docx |
|
||||
| 分块策略 | 阿里云 `vector_chunks` |
|
||||
| 嵌入模型 | `text-embedding-v3`(1536维 Dense) |
|
||||
| 嵌入模型 | `text-embedding-v3`(1024维 Dense) |
|
||||
| 向量数据库 | Milvus 2.4(本地Docker部署) |
|
||||
| 检索方式 | Dense-only 检索 |
|
||||
| API框架 | FastAPI |
|
||||
@@ -119,7 +119,7 @@ CHUNK_BACKEND=aliyun
|
||||
|
||||
# embedding 配置
|
||||
EMBEDDING_MODEL=text-embedding-v3
|
||||
EMBEDDING_DIM=1536
|
||||
EMBEDDING_DIM=1024
|
||||
EMBEDDING_API_KEY=your_embedding_api_key_here
|
||||
|
||||
# 分块配置
|
||||
@@ -142,7 +142,7 @@ CHUNK_SIZE=512
|
||||
- `artifacts/{doc_id}/semantic_blocks.json`
|
||||
- `artifacts/{doc_id}/vector_chunks.json`
|
||||
|
||||
当前默认 Milvus collection 为 `regulations_dense_1536_v2`。
|
||||
当前默认 Milvus collection 为 `regulations_dense_1024_v2`。
|
||||
|
||||
## 许可证
|
||||
|
||||
|
||||
Reference in New Issue
Block a user