文档解析详情
← Back to document management
Battery safety doc_id: GBT-31484-2015-r2 Embedding in progress
Document detail

Trace parse artifacts from raw upload to vector index.

This view is for operators diagnosing why one document is delayed or degraded. It surfaces parser settings, semantic structure, chunk generation, and Milvus insertion as separate observable stages.

Current run

Battery density addendum review

Uploaded 09:14 by Battery Safety Team · parser backend `aliyun`

Pipeline progression

Live state
1. Object storage ingestion
Stored in bucket `upload-files` with artifact prefix `artifacts`
2. Aliyun parse layout extraction
Parsed pages, recovered tables, and OCR confidence summarize here once the run completes.
3. Semantic blocks
Semantic block persistence is tracked here after parse artifact storage completes.
4. Vector chunk build
Using overlap chunking with header-prefixed embedding text
5. Embedding generation
Target model `text-embedding-v3` · dimension 1024
6. Milvus insertion
Waiting for chunk vectors before collection sync

Run log

Recent events
09:18:11
Semantic block serialization completed
Stored block tree and section hierarchy in Postgres parse artifact store.
09:20:44
Chunk builder emitted overlap windows
Header context is prepended to vector chunks for downstream retrieval quality.
09:22:08
Embedding worker rate-limited temporarily
Retry budget still healthy. No manual action required unless latency exceeds 15 minutes.

Artifacts generated

Output layers
Layout JSON Page, table, and text-span counts populate from the parser artifact output.
Semantic blocks Semantic nodes are mapped into chapter and clause hierarchy here.
Vector chunks Overlap windows and embedding text populate after chunk generation.

Chunk profile

Top segments
4.2 Energy density threshold
Critical requirement clause
chunk count pending character count pending Linked
5.1 Thermal event test method
Supplier evidence cross-reference
chunk count pending character count pending Review
Appendix A formulas and tables
Dense table extraction from scan
chunk count pending character count pending Noisy

Configuration snapshot

Runtime values
Parser backend
Document extraction engine
aliyun 5 s poll 900 s timeout
Embedding target
Vector generation
text-embedding-v3 1024 dim top_k 10
Collection
Milvus destination
regulations_dense_1024_v2 dense-only ready