Refactor document handling and update Milvus collection settings
- Removed multiple failed document entries from `documents.json`. - Added a new document entry with updated metadata and changed the index name to `regulations_dense_1024_v2`. - Updated architecture documentation to reflect changes in the Milvus collection name. - Adjusted requirements by removing the sqlalchemy dependency. - Modified test cases to align with new document structure and naming conventions. - Introduced a new test file for Milvus vector index runtime recovery and error handling. - Updated assertions in various test files to ensure compatibility with the new schema.
This commit is contained in:
@@ -7,7 +7,7 @@
|
||||
- 上传入口保持为 `/api/v1/documents/upload`
|
||||
- 默认 `PARSER_BACKEND=aliyun`
|
||||
- 默认 `CHUNK_BACKEND=aliyun`
|
||||
- 默认 Milvus collection 为 `regulations_dense_1536_v2`
|
||||
- 默认 Milvus collection 为 `regulations_dense_1024_v2`
|
||||
- 解析产物落到 MinIO `artifacts/{doc_id}/`
|
||||
|
||||
完整主链路如下:
|
||||
@@ -19,7 +19,7 @@
|
||||
5. 转换为 `structure_nodes / semantic_blocks / vector_chunks`
|
||||
6. 三层结构 JSON 回写 MinIO
|
||||
7. 使用 `vector_chunks[*].embedding_text` 调 embedding API
|
||||
8. 写入 `regulations_dense_1536_v2`
|
||||
8. 写入 `regulations_dense_1024_v2`
|
||||
9. 文档状态更新为 `indexed`
|
||||
|
||||
运行时转换逻辑位于 `backend/app/infrastructure/parser/aliyun_layout_normalizer.py`。
|
||||
|
||||
Reference in New Issue
Block a user