add docs

2025-11-07 09:31:42 +08:00
parent fa8ebffac9
commit c5f8fe06e7
82 changed files with 13284 additions and 0 deletions
--- a/docs/guides/dataset/best_practices/_category_.json
+++ b/docs/guides/dataset/best_practices/_category_.json
@@ -0,0 +1,8 @@
+{
+  "label": "Best practices",
+  "position": 11,
+  "link": {
+    "type": "generated-index",
+    "description": "Best practices on configuring a dataset."
+  }
+}
--- a/docs/guides/dataset/best_practices/accelerate_doc_indexing.mdx
+++ b/docs/guides/dataset/best_practices/accelerate_doc_indexing.mdx
@@ -0,0 +1,19 @@
+---
+sidebar_position: 1
+slug: /accelerate_doc_indexing
+---
+
+# Accelerate indexing
+import APITable from '@site/src/components/APITable';
+
+A checklist to speed up document parsing and indexing.
+
+---
+
+Please note that some of your settings may consume a significant amount of time. If you often find that document parsing is time-consuming, here is a checklist to consider:
+
+- Use GPU to reduce embedding time.
+- On the configuration page of your dataset, switch off **Use RAPTOR to enhance retrieval**.
+- Extracting knowledge graph (GraphRAG) is time-consuming.
+- Disable **Auto-keyword** and **Auto-question** on the configuration page of your dataset, as both depend on the LLM.
+- **v0.17.0+:** If all PDFs in your dataset are plain text and do not require GPU-intensive processes like OCR (Optical Character Recognition), TSR (Table Structure Recognition), or DLA (Document Layout Analysis), you can choose **Naive** over **DeepDoc** or other time-consuming large model options in the **Document parser** dropdown. This will substantially reduce document parsing time.