Retrieval Ranking Refresh Audit¶
- Audit date:
2026-03-18(Asia/Tbilisi) - Repository root:
/home/standard/dspy_rag_in_repo_docs_and_impl1_retrieval_clean - Git HEAD during verification:
da10ad70b3fc1e1971a2baf2c2d23ae1542dc112
Scope¶
This audit captures a retrieval-quality refresh to the baseline repository RAG path. The final
implementation keeps the existing path-aware ranking and source-diversity behavior, but changes the
chunker to preserve paragraph boundaries before falling back to fixed-width slices. The final
scorer also adds a definition bonus for what is ... questions and a penalty for chunks that only
echo the full question text. That was driven by a real regression caught in the hushwheel fixture:
the retriever needed to keep definition sections coherent enough to surface heat-memory score and
lantern vowel evidence for the question What is the ember index?. During final push validation,
this turn also hardened the pytest and fixture-doc build paths to use cache-backed temp directories
instead of the host's full /tmp tmpfs. The final branch state was then rebased onto the latest
origin/master, which already carried upstream fixes that made the utility-sync tests side-effect
free under git-hook execution.
Executed Commands¶
Executed successfully in this turn:
make hooks-installuv run python -m compileall src testsuv run pytest tests/test_utilities.py tests/test_repository_rag_bdd.pyuv run repo-rag smoke-testcargo build --manifest-path rust-cli/Cargo.tomluv run pytest tests/test_retrieval.py tests/test_hushwheel_fixture.py tests/test_benchmarks_and_notebook_scaffolding.py tests/test_project_surfaces.py tests/test_cli_and_dspy.py tests/test_verification.pyuv run repo-rag verify-surfacesuv run repo-rag retrieval-eval --root . --top-k 4 --top-k-sweep 1,2,4,8PRE_COMMIT_HOME=.pre-commit-cache uv run pre-commit run --all-files --hook-stage pre-push -vmake coveragemake quality
Notable Results¶
uv run python -m compileall src tests: passeduv run pytest tests/test_utilities.py tests/test_repository_rag_bdd.py: passed,11testsuv run repo-rag smoke-test: passed withanswer_contains_repository: true,mcp_candidate_count: 1, andmanifest_path: artifacts/azure/repo-rag-smoke.jsoncargo build --manifest-path rust-cli/Cargo.toml: passed- focused retrieval, hushwheel fixture, notebook, project-surface, CLI, and verification pytest
slice: passed,
46tests uv run repo-rag verify-surfaces: passed withissue_count: 0uv run repo-rag retrieval-eval --root . --top-k 4 --top-k-sweep 1,2,4,8: passed and reported:- default
top_k: 4 pass_rate: 1.0fully_covered_rate: 1.0average_source_recall: 1.0average_source_precision: 0.5833333333333334average_reciprocal_rank: 1.0best_pass_rate_top_k: 4PRE_COMMIT_HOME=.pre-commit-cache uv run pre-commit run --all-files --hook-stage pre-push -v: passed; mypy, basedpyright, coverage, and repository-surface verification all completed cleanlymake coverage: passed with119tests and87.98%total coveragemake quality: passed with119tests and87.98%total coverage
Current Verification Status¶
Configured and verified in this turn:
- Compile checks: present and passed through
uv run python -m compileall src tests - Utility and baseline pytest slice: present and passed through
uv run pytest tests/test_utilities.py tests/test_repository_rag_bdd.py - Repository smoke test: present and passed through
uv run repo-rag smoke-test - Rust build: present and passed through
cargo build --manifest-path rust-cli/Cargo.toml - Retrieval, hushwheel fixture, benchmark, notebook-scaffolding, project-surface, CLI, and
verification tests: present and passed through the focused
uv run pytest ...slice above - Repository-surface verification: present and passed through
uv run repo-rag verify-surfaces - Retrieval-quality evaluation utility: present and passed through
uv run repo-rag retrieval-eval --root . --top-k 4 --top-k-sweep 1,2,4,8 - Installed git-hook pre-push gate: present and passed through
PRE_COMMIT_HOME=.pre-commit-cache uv run pre-commit run --all-files --hook-stage pre-push -v - Full pytest and coverage gate: present and passed through
make coverage - Lint, notebook lint, mypy, basedpyright, complexity, full pytest, and coverage: present and
passed through
make quality
Still absent or not exercised in this turn:
- UI or browser tests: none found in repository configuration
- Full notebook execution batch: notebook lint and surface checks passed, but
make notebook-reportwas not rerun end-to-end in this turn - Live Azure OpenAI and Azure AI Inference probes: not rerun in this turn
- Post-push GitHub Actions evidence: not yet available before the push for this change set
Notes¶
- The final retrieval change is the paragraph-aware chunker in
src/repo_rag_lab/retrieval.py. It keeps concept definitions and similar prose blocks intact before splitting long paragraphs by width. - The final scoring tweak also rewards definitional
term is ...chunks forwhat is ...questions and penalizes chunks that merely repeat the entire question text. - Path-aware ranking and source diversity remain in place, so docs and implementation files still
outrank synthetic question-echo surfaces from
tests/,samples/, anddata/. - The hushwheel fixture regression was caught locally during this turn and resolved before the
final audit. The passing fixture tests confirm that the document question about the ember index
again surfaces context containing both
heat-memory scoreandlantern vowel. - A stopword-filter scoring variant was explored and rejected in this turn because it improved the hushwheel probe but reduced the repository benchmark's top-4 source coverage. Paragraph-aware chunking fixed the real issue without sacrificing benchmark completeness.
- The root
Makefileand the hushwheel fixtureMakefilenow route pytest and doc-build temp data through cache-backed directories under$(HOME)/.cache, which keeps the verification path stable even when/tmpis saturated on the host. - The final rebased branch was validated against the current upstream hook configuration, so the push path no longer mutates tracked publication or inventory surfaces as a side effect of running the utility tests.