- pipeline.py: log each metric score/timeout/error with sample_id,
elapsed time, and score value; log NaN list per sample; progress
counter N/total after each sample completes
- evaluator.py: log eval start, dataset counts, adapter enrichment
progress (per-sample OK/FAIL with elapsed), metric scoring summary,
and per-metric NaN rate at end of run
- runner.py: _setup_logging() helper writes to stderr + optional file;
ragas/httpx/openai noisy loggers throttled to WARNING
- main.py: add --log-file and --log-level CLI flags
Usage:
python main.py --scenario scenarios/online/... --log-file logs/eval.log --log-level DEBUG
Co-Authored-By: Claude <noreply@anthropic.com>
- question_generator.py: add max_retries=3/retry_delay=5s loop with
exponential backoff on LLM timeout or server errors; encode filenames
with ascii/replace before printing to avoid UnicodeEncodeError on
Windows cp1252 consoles
- runner.py: encode PDF filenames ASCII-safe for progress messages;
catch generation failures per-document and skip (or re-raise) based
on failure_mode, preventing one bad doc from aborting the whole build
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>