feat: Migrate document parsing to Aliyun and update embedding configurations

- Updated LocalDocumentParser to include raw_layouts and artifact_prefix from settings.
- Added new documents with failure reasons and metadata to documents.json for better error tracking.
- Created a new documentation file detailing the Aliyun ingest implementation process.
- Updated RFC to reflect changes in the parsing backend and embedding dimensions.
- Modified tests to accommodate the new embedding dimension of 1024 and updated parser and chunk builder assertions.
- Verified migration configurations to ensure correct settings for embedding model and backend.
This commit is contained in:
ash66
2026-05-18 22:30:28 +08:00
parent 3f69cad404
commit c22b03dc07
26 changed files with 1092 additions and 6500 deletions

View File

@@ -32,6 +32,10 @@ async def get_config():
"embedding_dim": settings.embedding_dim,
"embedding_base_url": settings.embedding_base_url,
"milvus_collection": settings.milvus_collection,
"parser_backend": settings.parser_backend,
"chunk_backend": settings.chunk_backend,
"artifact_prefix": settings.document_parse_artifact_prefix,
"parser_failure_mode": settings.parser_failure_mode,
"llm_provider": settings.llm_provider,
"llm_model": settings.llm_model,
"document_metadata_path": settings.document_metadata_path,