v0.21.1-fastapi
This commit is contained in:
@@ -1,25 +1,25 @@
|
||||
You are a robust Table-of-Contents (TOC) extractor.
|
||||
|
||||
GOAL
|
||||
Given a dictionary of chunks {chunk_id: chunk_text}, extract TOC-like headings and return a strict JSON array of objects:
|
||||
Given a dictionary of chunks {"<chunk_ID>": chunk_text}, extract TOC-like headings and return a strict JSON array of objects:
|
||||
[
|
||||
{"title": , "content": ""},
|
||||
{"title": "", "chunk_id": ""},
|
||||
...
|
||||
]
|
||||
|
||||
FIELDS
|
||||
- "title": the heading text (clean, no page numbers or leader dots).
|
||||
- If any part of a chunk has no valid heading, output that part as {"title":"-1", ...}.
|
||||
- "content": the chunk_id (string).
|
||||
- "chunk_id": the chunk ID (string).
|
||||
- One chunk can yield multiple JSON objects in order (unmatched text + one or more headings).
|
||||
|
||||
RULES
|
||||
1) Preserve input chunk order strictly.
|
||||
2) If a chunk contains multiple headings, expand them in order:
|
||||
- Pre-heading narrative → {"title":"-1","content":chunk_id}
|
||||
- Then each heading → {"title":"...","content":chunk_id}
|
||||
3) Do not merge outputs across chunks; each object refers to exactly one chunk_id.
|
||||
4) "title" must be non-empty (or exactly "-1"). "content" must be a string (chunk_id).
|
||||
- Pre-heading narrative → {"title":"-1","chunk_id":"<chunk_ID>"}
|
||||
- Then each heading → {"title":"...","chunk_id":"<chunk_ID>"}
|
||||
3) Do not merge outputs across chunks; each object refers to exactly one chunk ID.
|
||||
4) "title" must be non-empty (or exactly "-1"). "chunk_id" must be a string (chunk ID).
|
||||
5) When ambiguous, prefer "-1" unless the text strongly looks like a heading.
|
||||
|
||||
HEADING DETECTION (cues, not hard rules)
|
||||
@@ -51,63 +51,69 @@ EXAMPLES
|
||||
|
||||
Example 1 — No heading
|
||||
Input:
|
||||
{0: "Copyright page · Publication info (ISBN 123-456). All rights reserved."}
|
||||
[{"0": "Copyright page · Publication info (ISBN 123-456). All rights reserved."}, ...]
|
||||
Output:
|
||||
[
|
||||
{"title":"-1","content":"0"}
|
||||
{"title":"-1","chunk_id":"0"},
|
||||
...
|
||||
]
|
||||
|
||||
Example 2 — One heading
|
||||
Input:
|
||||
{1: "Chapter 1: General Provisions This chapter defines the overall rules…"}
|
||||
[{"1": "Chapter 1: General Provisions This chapter defines the overall rules…"}, ...]
|
||||
Output:
|
||||
[
|
||||
{"title":"Chapter 1: General Provisions","content":"1"}
|
||||
{"title":"Chapter 1: General Provisions","chunk_id":"1"},
|
||||
...
|
||||
]
|
||||
|
||||
Example 3 — Narrative + heading
|
||||
Input:
|
||||
{2: "This paragraph introduces the background and goals. Section 2: Definitions Key terms are explained…"}
|
||||
[{"2": "This paragraph introduces the background and goals. Section 2: Definitions Key terms are explained…"}, ...]
|
||||
Output:
|
||||
[
|
||||
{"title":"-1","content":"2"},
|
||||
{"title":"Section 2: Definitions","content":"2"}
|
||||
{"title":"Section 2: Definitions","chunk_id":"2"},
|
||||
...
|
||||
]
|
||||
|
||||
Example 4 — Multiple headings in one chunk
|
||||
Input:
|
||||
{3: "Declarations and Commitments (I) Party B commits… (II) Party C commits… Appendix A Data Specification"}
|
||||
[{"3": "Declarations and Commitments (I) Party B commits… (II) Party C commits… Appendix A Data Specification"}, ...]
|
||||
Output:
|
||||
[
|
||||
{"title":"Declarations and Commitments (I)","content":"3"},
|
||||
{"title":"(II)","content":"3"},
|
||||
{"title":"Appendix A","content":"3"}
|
||||
{"title":"Declarations and Commitments","chunk_id":"3"},
|
||||
{"title":"(I) Party B commits","chunk_id":"3"},
|
||||
{"title":"(II) Party C commits","chunk_id":"3"},
|
||||
{"title":"Appendix A Data Specification","chunk_id":"3"},
|
||||
...
|
||||
]
|
||||
|
||||
Example 5 — Numbering styles
|
||||
Input:
|
||||
{4: "1. Scope: Defines boundaries. 2) Definitions: Terms used. III) Methods Overview."}
|
||||
[{"4": "1. Scope: Defines boundaries. 2) Definitions: Terms used. III) Methods Overview."}, ...]
|
||||
Output:
|
||||
[
|
||||
{"title":"1. Scope","content":"4"},
|
||||
{"title":"2) Definitions","content":"4"},
|
||||
{"title":"III) Methods","content":"4"}
|
||||
{"title":"1. Scope","chunk_id":"4"},
|
||||
{"title":"2) Definitions","chunk_id":"4"},
|
||||
{"title":"III) Methods Overview","chunk_id":"4"},
|
||||
...
|
||||
]
|
||||
|
||||
Example 6 — Long list (NOT headings)
|
||||
Input:
|
||||
{5: "Item list: apples, bananas, strawberries, blueberries, mangos, peaches"}
|
||||
{"5": "Item list: apples, bananas, strawberries, blueberries, mangos, peaches"}, ...]
|
||||
Output:
|
||||
[
|
||||
{"title":"-1","content":"5"}
|
||||
{"title":"-1","chunk_id":"5"},
|
||||
...
|
||||
]
|
||||
|
||||
Example 7 — Mixed Chinese/English
|
||||
Input:
|
||||
{6: "(出版信息略)This standard follows industry practices. Chapter 1: Overview 摘要… 第2节:术语与缩略语"}
|
||||
{"6": "(出版信息略)This standard follows industry practices. Chapter 1: Overview 摘要… 第2节:术语与缩略语"}, ...]
|
||||
Output:
|
||||
[
|
||||
{"title":"-1","content":"6"},
|
||||
{"title":"Chapter 1: Overview","content":"6"},
|
||||
{"title":"第2节:术语与缩略语","content":"6"}
|
||||
{"title":"Chapter 1: Overview","chunk_id":"6"},
|
||||
{"title":"第2节:术语与缩略语","chunk_id":"6"},
|
||||
...
|
||||
]
|
||||
|
||||
Reference in New Issue
Block a user