Skip to content

Retrieval Ranking Refresh Audit

  • Audit date: 2026-03-18 (Asia/Tbilisi)
  • Repository root: /home/standard/dspy_rag_in_repo_docs_and_impl1_retrieval_clean
  • Git HEAD during verification: da10ad70b3fc1e1971a2baf2c2d23ae1542dc112

Scope

This audit captures a retrieval-quality refresh to the baseline repository RAG path. The final implementation keeps the existing path-aware ranking and source-diversity behavior, but changes the chunker to preserve paragraph boundaries before falling back to fixed-width slices. The final scorer also adds a definition bonus for what is ... questions and a penalty for chunks that only echo the full question text. That was driven by a real regression caught in the hushwheel fixture: the retriever needed to keep definition sections coherent enough to surface heat-memory score and lantern vowel evidence for the question What is the ember index?. During final push validation, this turn also hardened the pytest and fixture-doc build paths to use cache-backed temp directories instead of the host's full /tmp tmpfs. The final branch state was then rebased onto the latest origin/master, which already carried upstream fixes that made the utility-sync tests side-effect free under git-hook execution.

Executed Commands

Executed successfully in this turn:

  • make hooks-install
  • uv run python -m compileall src tests
  • uv run pytest tests/test_utilities.py tests/test_repository_rag_bdd.py
  • uv run repo-rag smoke-test
  • cargo build --manifest-path rust-cli/Cargo.toml
  • uv run pytest tests/test_retrieval.py tests/test_hushwheel_fixture.py tests/test_benchmarks_and_notebook_scaffolding.py tests/test_project_surfaces.py tests/test_cli_and_dspy.py tests/test_verification.py
  • uv run repo-rag verify-surfaces
  • uv run repo-rag retrieval-eval --root . --top-k 4 --top-k-sweep 1,2,4,8
  • PRE_COMMIT_HOME=.pre-commit-cache uv run pre-commit run --all-files --hook-stage pre-push -v
  • make coverage
  • make quality

Notable Results

  • uv run python -m compileall src tests: passed
  • uv run pytest tests/test_utilities.py tests/test_repository_rag_bdd.py: passed, 11 tests
  • uv run repo-rag smoke-test: passed with answer_contains_repository: true, mcp_candidate_count: 1, and manifest_path: artifacts/azure/repo-rag-smoke.json
  • cargo build --manifest-path rust-cli/Cargo.toml: passed
  • focused retrieval, hushwheel fixture, notebook, project-surface, CLI, and verification pytest slice: passed, 46 tests
  • uv run repo-rag verify-surfaces: passed with issue_count: 0
  • uv run repo-rag retrieval-eval --root . --top-k 4 --top-k-sweep 1,2,4,8: passed and reported:
  • default top_k: 4
  • pass_rate: 1.0
  • fully_covered_rate: 1.0
  • average_source_recall: 1.0
  • average_source_precision: 0.5833333333333334
  • average_reciprocal_rank: 1.0
  • best_pass_rate_top_k: 4
  • PRE_COMMIT_HOME=.pre-commit-cache uv run pre-commit run --all-files --hook-stage pre-push -v: passed; mypy, basedpyright, coverage, and repository-surface verification all completed cleanly
  • make coverage: passed with 119 tests and 87.98% total coverage
  • make quality: passed with 119 tests and 87.98% total coverage

Current Verification Status

Configured and verified in this turn:

  • Compile checks: present and passed through uv run python -m compileall src tests
  • Utility and baseline pytest slice: present and passed through uv run pytest tests/test_utilities.py tests/test_repository_rag_bdd.py
  • Repository smoke test: present and passed through uv run repo-rag smoke-test
  • Rust build: present and passed through cargo build --manifest-path rust-cli/Cargo.toml
  • Retrieval, hushwheel fixture, benchmark, notebook-scaffolding, project-surface, CLI, and verification tests: present and passed through the focused uv run pytest ... slice above
  • Repository-surface verification: present and passed through uv run repo-rag verify-surfaces
  • Retrieval-quality evaluation utility: present and passed through uv run repo-rag retrieval-eval --root . --top-k 4 --top-k-sweep 1,2,4,8
  • Installed git-hook pre-push gate: present and passed through PRE_COMMIT_HOME=.pre-commit-cache uv run pre-commit run --all-files --hook-stage pre-push -v
  • Full pytest and coverage gate: present and passed through make coverage
  • Lint, notebook lint, mypy, basedpyright, complexity, full pytest, and coverage: present and passed through make quality

Still absent or not exercised in this turn:

  • UI or browser tests: none found in repository configuration
  • Full notebook execution batch: notebook lint and surface checks passed, but make notebook-report was not rerun end-to-end in this turn
  • Live Azure OpenAI and Azure AI Inference probes: not rerun in this turn
  • Post-push GitHub Actions evidence: not yet available before the push for this change set

Notes

  • The final retrieval change is the paragraph-aware chunker in src/repo_rag_lab/retrieval.py. It keeps concept definitions and similar prose blocks intact before splitting long paragraphs by width.
  • The final scoring tweak also rewards definitional term is ... chunks for what is ... questions and penalizes chunks that merely repeat the entire question text.
  • Path-aware ranking and source diversity remain in place, so docs and implementation files still outrank synthetic question-echo surfaces from tests/, samples/, and data/.
  • The hushwheel fixture regression was caught locally during this turn and resolved before the final audit. The passing fixture tests confirm that the document question about the ember index again surfaces context containing both heat-memory score and lantern vowel.
  • A stopword-filter scoring variant was explored and rejected in this turn because it improved the hushwheel probe but reduced the repository benchmark's top-4 source coverage. Paragraph-aware chunking fixed the real issue without sacrificing benchmark completeness.
  • The root Makefile and the hushwheel fixture Makefile now route pytest and doc-build temp data through cache-backed directories under $(HOME)/.cache, which keeps the verification path stable even when /tmp is saturated on the host.
  • The final rebased branch was validated against the current upstream hook configuration, so the push path no longer mutates tracked publication or inventory surfaces as a side effect of running the utility tests.