Reproducibility auditing for ML GitHub repositories using AST-based static analysis and AI semantic checks, with CI integration via GitHub Actions.
RepoAudit is an automated machine-learning repository auditor that scans GitHub projects and generates a 0–100 reproducibility score. It combines Python AST analysis with LLM-based README validation to detect issues in experiment determinism, environment setup, dataset handling, and documentation.
Live website: https://repo-audit.vercel.app
Automated Repo Analysis:
Clones public GitHub repositories and evaluates reproducibility across six categories: environment setup, determinism, dataset usage, semantic documentation alignment, execution entry points, and README completeness.
Determinism Detection:
Python AST inspection identifies missing seeds for PyTorch, NumPy, TensorFlow, and Python random, highlighting experiments that may produce non-reproducible results.
Dataset Path Validation:
Detects hardcoded local paths and checks for dataset documentation or download instructions.
README–Code Consistency Checks:
LLM audit compares README instructions with repository structure, flagging missing scripts or mismatched setup steps.
Dependency & Environment Checks:
Verifies reproducible environments through requirements.txt, environment.yml, or Dockerfiles and identifies unpinned dependencies.
Score History Tracking:
Stores audits per commit and visualizes reproducibility trends across repository updates.
GitHub Action Integration:
Runs RepoAudit in CI pipelines, posts PR reports, and optionally fails builds below a reproducibility threshold.
Commit-Based Caching:
Uses commit hashes to reuse previous audit results and avoid redundant analysis.
Distributed audit pipeline with asynchronous repository analysis.
Frontend (Next.js):
Dashboard for submitting repos and visualising scores, category breakdowns, and history charts.
API Layer (FastAPI):
Handles audit requests, status polling, and result retrieval.
Worker System (Celery + Redis):
Executes repository scans asynchronously and manages queued analysis jobs.
Analysis Engine (Python):
Clones repositories, performs AST-based checks, dataset path analysis, dependency inspection, import graph tracing, and LLM semantic auditing.
Storage (Supabase Postgres):
Stores repository metadata, audit results, and historical scores.
Caching (Upstash Redis):
Commit-hash cache for fast reuse of previous analysis results.
Deployment:
Frontend on Vercel; backend and workers on Render using a containerised Docker setup.