Document detail
Trace parse artifacts from raw upload to vector index.
This view is for operators diagnosing why one document is delayed or degraded. It surfaces parser settings, semantic structure, chunk generation, and Milvus insertion as separate observable stages.
Current run
Battery density addendum review
Uploaded 09:14 by Battery Safety Team · parser backend `aliyun`
Pipeline progression
Live state
1. Object storage ingestion
Stored in bucket `upload-files` with artifact prefix `artifacts`
2. Aliyun parse layout extraction
Parsed pages, recovered tables, and OCR confidence summarize here once the run completes.
3. Semantic blocks
Semantic block persistence is tracked here after parse artifact storage completes.
4. Vector chunk build
Using overlap chunking with header-prefixed embedding text
5. Embedding generation
Target model `text-embedding-v3` · dimension 1024
6. Milvus insertion
Waiting for chunk vectors before collection sync
Run log
Recent events
09:18:11
Semantic block serialization completed
Stored block tree and section hierarchy in Postgres parse artifact store.
09:20:44
Chunk builder emitted overlap windows
Header context is prepended to vector chunks for downstream retrieval quality.
09:22:08
Embedding worker rate-limited temporarily
Retry budget still healthy. No manual action required unless latency exceeds 15 minutes.
Artifacts generated
Output layers
Layout JSON
Page, table, and text-span counts populate from the parser artifact output.
Semantic blocks
Semantic nodes are mapped into chapter and clause hierarchy here.
Vector chunks
Overlap windows and embedding text populate after chunk generation.
Chunk profile
Top segments
4.2 Energy density threshold
Critical requirement clause
chunk count pending
character count pending
Linked
5.1 Thermal event test method
Supplier evidence cross-reference
chunk count pending
character count pending
Review
Appendix A formulas and tables
Dense table extraction from scan
chunk count pending
character count pending
Noisy
Configuration snapshot
Runtime values
Parser backendDocument extraction engine
aliyun
5 s poll
900 s timeout
Embedding targetVector generation
text-embedding-v3
1024 dim
top_k 10
CollectionMilvus destination
regulations_dense_1024_v2
dense-only
ready