Skip to content

Repository Agent Instructions

Agents working in this repository should stay inside the shared uv-managed workflow whenever possible. Prefer repository-local utilities over ad hoc one-off commands when the repository already exposes the behavior through make or uv run repo-rag ....

Additional standing instructions live in AGENTS.md.d/*.md. Read them before reporting verification results, and anchor status statements to the newest file in docs/audit/.

Codex Skills

  • Repo-local skill implementations live under .codex/skills/.
  • Use file-summary-sync when tracked files are added, removed, renamed, or when FILES.md and FILES.csv need to stay aligned with the repository tree.
  • Use exploratorium-translation-sync when bilingual file/link/fetch-state summaries, bibliography fetch reporting, or the publication/exploratorium_translation/ subdocument are in scope.
  • Use rust-sqlite-lookup when a repo question needs exact file hits, path discovery, or a cheap local search pass before make ask-dspy.
  • Use repo-verification-audit-loop when verification work, audit-note updates, or repository health reporting are part of the task.
  • Use post-push-gh-run-logging immediately after each push and whenever GitHub Actions failures or samples/logs/ updates are in scope.
  • Use notebook-playbook-sync when editing notebooks, notebook scaffolding, notebook-facing docs, or training and population sample surfaces.
  • Use todo-backlog-sync when editing todo-backlog.yaml, TODO.MD, the publication backlog table, or backlog-facing workflow documentation.

Primary Utilities

  • make utility-summary
  • make files-sync
  • make rust-lookup-index
  • make rust-lookup QUERY="..."
  • make todo-sync
  • make exploratorium-sync
  • make ask QUESTION="..."
  • make ask-live QUESTION="..."
  • make retrieval-eval
  • make discover-mcp
  • make azure-openai-probe
  • make azure-inference-probe
  • make smoke-test
  • make verify-surfaces
  • make gh-runs
  • make gh-watch
  • make gh-failed-logs
  • uv run repo-rag utility-summary
  • uv run repo-rag sync-file-summaries --root .
  • uv run repo-rag sync-exploratorium-translation --root .
  • cargo run --manifest-path rust-cli/Cargo.toml -- index
  • cargo run --manifest-path rust-cli/Cargo.toml -- lookup "dspy training"
  • uv run repo-rag ask-live --question "..." --provider azure-openai
  • uv run repo-rag retrieval-eval --top-k-sweep "1,2,4,8"
  • uv run repo-rag ask --question "..." --use-dspy

Working Rules

  1. Start with an existing make target or uv run repo-rag ... command before inventing a new workflow.
  2. Keep notebooks, tests, CLI behavior, and docs aligned around the same package helpers.
  3. When changing retrieval, MCP discovery, deployment metadata, or verification behavior, update tests and notebook guidance in the same turn.
  4. If adding a new user-facing utility, expose it through both the Python CLI and the Makefile when practical.
  5. make ask and uv run repo-rag ask now perform Rust SQLite lookup-first narrowing automatically. Use make rust-lookup when you want to inspect those direct hits yourself before escalating to make ask-dspy.
  6. Prefer tests that validate user-visible behavior instead of only internal helpers.
  7. After every push, use post-push-gh-run-logging: run make gh-runs, then make gh-watch, and write a summary log into samples/logs/. If the watched run fails, inspect it with make gh-failed-logs, fix the repository, rerun local validation, and push again. If the only follow-up would be a recursive log-only commit for a prior log-only push, summarize the result instead of creating endless log churn.
  8. If a permission-gated action is blocked, explicitly offer the user the option to make that permission permanent in Codex settings before retrying.
  9. Keep reusable notebook logic in src/ with doctests or normal pytest coverage instead of embedding it in notebook cells.
  10. Keep the repository fully uv-managed unless uv no longer covers a required workflow.
  11. Treat README.AGENTS.md as the overreaching research narrative for the repository; when a turn materially changes workflow stages, DSPy capabilities, notebooks, verification posture, publication scope, or deployment handoff, update README.AGENTS.md in the same turn.
  12. When tracked files, publication inventories, or bibliography-linked fetch summaries change, refresh FILES.md, FILES.csv, and the exploratorium outputs in the same turn.

Research Narrative

  • README.AGENTS.md is the top-level narrative that explains how the repository's research story fits together across code, notebooks, DSPy, audits, CI logs, publication outputs, and deployment metadata.
  • Keep it current continuously, not as occasional cleanup. If the repo story changes, the narrative doc should change with it.

Validation Expectations

  • For Python changes, run uv run python -m compileall src tests.
  • For utility changes, run uv run pytest tests/test_utilities.py tests/test_repository_rag_bdd.py.
  • For quality-sensitive changes, run make quality.
  • For coverage-specific checks, run make coverage.
  • After syncing the environment, run make hooks-install.
  • If cargo exists, also run cargo build --manifest-path rust-cli/Cargo.toml.
  • After pushing, run make gh-runs GH_RUN_LIMIT=10, then make gh-watch, and store the relevant run details in samples/logs/.

Audit Files

  • Use repo-verification-audit-loop when verification evidence, audit-note updates, or repository status reporting are part of the task.
  • Review docs/audit/README.md and the newest dated audit note before describing repository health.
  • When verification status changes, update the relevant docs/audit/*.md files in the same turn.

Notebook Expectations

Notebooks in notebooks/ should read like research playbooks:

  • Use notebook-playbook-sync when notebook files or notebook-facing helpers are in scope.
  • Use Markdown headers and subheaders to explain each step.
  • Keep code cells short and tied to one research action.
  • Reuse repository-local utilities and package APIs instead of duplicating logic inline.
  • Move training and corpus-population logic into doctested Python modules under src/.