2 Commits

Author SHA1 Message Date
wangwei
1ff4a3943a feat(dataset-builder): add retry logic and ASCII-safe logging for Siemens PDF pipeline
- question_generator.py: add max_retries=3/retry_delay=5s loop with
  exponential backoff on LLM timeout or server errors; encode filenames
  with ascii/replace before printing to avoid UnicodeEncodeError on
  Windows cp1252 consoles
- runner.py: encode PDF filenames ASCII-safe for progress messages;
  catch generation failures per-document and skip (or re-raise) based
  on failure_mode, preventing one bad doc from aborting the whole build

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
2026-06-15 23:06:33 +08:00
Guangfei.Zhao
9cbdc1d95d first commit 2026-06-12 14:02:15 +08:00