- pipeline.py: log each metric score/timeout/error with sample_id,
elapsed time, and score value; log NaN list per sample; progress
counter N/total after each sample completes
- evaluator.py: log eval start, dataset counts, adapter enrichment
progress (per-sample OK/FAIL with elapsed), metric scoring summary,
and per-metric NaN rate at end of run
- runner.py: _setup_logging() helper writes to stderr + optional file;
ragas/httpx/openai noisy loggers throttled to WARNING
- main.py: add --log-file and --log-level CLI flags
Usage:
python main.py --scenario scenarios/online/... --log-file logs/eval.log --log-level DEBUG
Co-Authored-By: Claude <noreply@anthropic.com>