Retrieval Regression Gate Audit¶
- Audit date:
2026-03-18(Asia/Tbilisi) - Repository root:
/tmp/repo-retrieval-gate-eFWeTo - Verification branch:
retrieval-gate-20260318130809
Scope¶
This audit captures the change that turns repository retrieval evaluation from an informational
report into a shared regression gate. The same threshold-aware retrieval-eval surface now backs:
make quality- the managed pre-push hook
- the GitHub Actions
CIworkflow
The enforced defaults are intentionally simple and strict for the checked-in benchmark corpus:
minimum_pass_rate=1.0minimum_source_recall=1.0
Implemented Surfaces¶
src/repo_rag_lab/benchmarks.pynow exposes shared threshold helpers for pass rate and average source recall.src/repo_rag_lab/utilities.pyserializes threshold configuration, failures, and overall status inrun_retrieval_evaluation(...).src/repo_rag_lab/cli.pynow accepts--minimum-pass-rateand--minimum-source-recall, and exits nonzero when threshold failures are present.Makefilenow sets repository defaults for those thresholds and makesretrieval-evalpart ofmake quality..pre-commit-config.yamlnow adds a dedicated pre-push retrieval benchmark gate..github/workflows/ci.ymlnow runsmake retrieval-evalas an explicit CI step.
Executed Commands¶
Executed successfully in this turn:
make hooks-installuv run python -m compileall src testsuv run pytest tests/test_benchmarks_and_notebook_scaffolding.py tests/test_utilities.py tests/test_cli_and_dspy.py tests/test_project_surfaces.pyuv run repo-rag retrieval-eval --root . --top-k 4 --top-k-sweep 1,2,4,8 --minimum-pass-rate 1.0 --minimum-source-recall 1.0uv run repo-rag retrieval-eval --root . --top-k 4 --top-k-sweep 1,2,4,8 --minimum-pass-rate 1.1 --minimum-source-recall 1.1make qualitycargo build --manifest-path rust-cli/Cargo.toml
Results¶
- threshold-aware focused pytest slice: passed,
54 passed - threshold-aware passing retrieval evaluation: passed and reported
benchmark_count: 8pass_rate: 1.0average_source_recall: 1.0threshold_failures: []status: "pass"- threshold-aware failing retrieval evaluation: failed intentionally with exit status
1and reported both threshold failures when configured above the current benchmark metrics make quality: passed with the retrieval gate included,124 passed,88.02%total coveragecargo build --manifest-path rust-cli/Cargo.toml: passed
Notes¶
- The gate is strict because the current checked-in benchmark suite already achieves full pass rate
and full source recall at the repository default
top_k=4. - The CLI still supports ad hoc informational runs by omitting the threshold flags directly, while
the repo-local
Makefile, hook, and CI surfaces keep the stricter contract.