CODEBASE TRAINING DATA

Codebase Intelligence

Full-history private codebases - commits, pull requests, reviews, and linked issues - packaged as training data for coding agents and SWE evals.

FULL HISTORYGit · JSON · ParquetOne-time archive or periodic snapshot
1.4K+
Private repos
150K+
Commits + diffs
6
Linked artifact types
Polyglot
Web · ML · systems
01Sample

See the data

Representative records in the exact shape we deliver. Real provenance and full slices are shared under license.

Resolved issue - webhook retry (TypeScript)

Representative shape, not real customer code. fail_to_pass tests fail at base_commit and pass after the gold PR.

codebase-intelligence.jsonlrepresentative
{
  "task_id": "cb-000142",
  "repo": "org/service-api",
  "language": "typescript",
  "issue_title": "Webhook retries drop events on 5xx",
  "issue_body": "When a delivery target returns 5xx, the event is marked failed and never retried. Expected: exponential backoff up to 5 attempts before dead-lettering.",
  "base_commit": "889b72b0d9c5c8",
  "gold_pr": "PR #318",
  "changed_files": ["src/webhook/retry.ts", "src/webhook/retry.test.ts"],
  "test_command": "npm test -- retry",
  "fail_to_pass": ["retry > backs off on 5xx", "retry > dead-letters after 5 attempts"],
  "ci_status": "passed",
  "release_tag": "v2.14.0"
}

Resolved issue - N+1 query (Python)

Representative shape. The gold PR is withheld from the agent; only the issue and base state are posed.

codebase-intelligence.jsonlrepresentative
{
  "task_id": "cb-000291",
  "repo": "org/billing-worker",
  "language": "python",
  "issue_title": "Invoice export issues one query per line item",
  "issue_body": "Exporting a 500-line invoice fires 500 queries and times out. Batch the line-item fetch into a single query.",
  "base_commit": "4f1c0aa7d2",
  "gold_pr": "PR #1204",
  "changed_files": ["billing/export.py", "tests/test_export.py"],
  "test_command": "pytest tests/test_export.py -k batch",
  "fail_to_pass": ["test_export_batches_line_items"],
  "ci_status": "passed",
  "release_tag": null
}

Resolved issue - null deref on empty config (Go)

Representative shape. review_thread carries the inline comments that reshaped the merged change.

codebase-intelligence.jsonlrepresentative
{
  "task_id": "cb-000377",
  "repo": "org/edge-proxy",
  "language": "go",
  "issue_title": "Panic when upstream list is empty",
  "issue_body": "Starting the proxy with no upstreams configured panics on first request instead of returning a clear config error at boot.",
  "base_commit": "a90b12e4c7",
  "gold_pr": "PR #642",
  "changed_files": ["proxy/router.go", "proxy/router_test.go"],
  "test_command": "go test ./proxy/ -run EmptyUpstream",
  "fail_to_pass": ["TestEmptyUpstreamReturnsConfigError"],
  "review_thread": "validate at boot, not per-request; return wrapped error",
  "ci_status": "passed",
  "release_tag": "v0.9.3"
}
02Schema

Record shape

Every field, its type, whether it can be null, and a representative value.

FieldTypeConstraintDescription
task_idstringrequiredStable identifier for one issue-to-code trace.
e.g. cb-000142
repostringrequiredRepository slug the task is drawn from (owner/name).
e.g. org/service-api
languagestringrequiredPrimary language of the changed files.
e.g. typescript
issue_titlestringrequiredTitle of the linked issue that motivated the change.
e.g. Webhook retries drop events on 5xx
issue_bodystringrequiredFull problem statement posed to the agent. The resolving commits and PR are withheld.
base_commitstringrequiredSHA of the repo state just before the resolving change. The agent sees only this state.
e.g. 889b72b0d9c5c8
gold_prstringrequiredReference to the human pull request that actually merged and resolved the issue.
e.g. PR #318
gold_diffstringrequiredUnified diff of the merged change, used as the reference for comparison.
changed_filesstring[]requiredFiles touched by the gold PR.
e.g. ["src/webhook/retry.ts"]
test_commandstringrequiredCommand that builds and runs the relevant tests for execution-based verification.
e.g. npm test -- retry
fail_to_passstring[]nullableTests that fail at base_commit and pass after the gold PR. Empty when not isolable.
review_threadstring · jsonnullableInline review comments and requested changes attached to the gold PR, when present.
ci_statusstringnullableBuild and test signal recorded at merge time for the gold PR.
e.g. passed
release_tagstringnullableRelease that shipped the change, anchoring it in a version.
e.g. v2.14.0
03What's included

Source & Review History

Commits, diffs, pull requests, and review threads with author and repo entity resolution. Real engineering decisions, not synthetic tasks.

Issue-to-Code Linkage

Tickets and project history joined to the commits and PRs that resolved them - end-to-end traces for SWE-agent evals.

Build & Release Signal

CI runs, release tags, and deploy history - the full lifecycle from issue to shipped code.

04Methodology

How it is built

  1. 01

    Consented collection

    Repositories are founder-owned and contributed with consent, then sanitized before use. This is what separates the corpus from scraped public code of unknown license.

  2. 02

    Artifact capture

    Each repo is captured with its full per-commit diff history plus the surrounding artifacts: pull requests with discussion, code reviews with inline comments, linked issues, CI runs, and release tags.

  3. 03

    Issue-to-code linkage

    Tickets are joined to the specific commits and pull requests that resolved them, so each resolved issue becomes an end-to-end trace from a stated problem to the change that shipped, through the review and CI that accompanied it.

  4. 04

    Sanitization and filtering

    Material is processed to remove sensitive content prior to inclusion. Traces without a clean issue-to-PR join are not promoted to task records.

  5. 05

    Point-in-time reconstruction

    For evaluation use, commit history and release tags recover the repository exactly as it stood just before the resolving change, so an agent sees only what was available then - no look-ahead into the fix.

05Evals

How we validate

What each evaluation measures and how it is run. Where no benchmark is published, we show the methodology and say so.

Point-in-Time SWE Task

Measures

Whether an agent can resolve a real issue the way a human engineer did - reconstruct the repo at the moment the issue was open, pose the issue, and compare the agent's change against the human PR that actually merged.

Method

Reconstruct state from commit history and release tags; pose the linked issue as the problem statement while withholding the resolving commits and PR; score the agent's output against the gold PR using execution-based verification (run the recorded test command) and a diff comparison to the merged change.

Result

Methodology-stage. No benchmark numbers are published; this is a data and methodology contribution, and quantifying agent performance is future work.

06Graders

Ground truth

What correct means for this data, and how it is established.

Ground truth

The human pull request that actually merged and resolved the linked issue - the change that passed the repo's own review and CI - together with the repository's existing test suite at that point in time.

How it is established

Execution-based verification plus diff comparison. A reference harness (repo_evaluator.py) clones the target repo, reconstructs the base commit, runs the agent-under-test against the posed issue, then runs the recorded test command and compares the resulting change to the gold PR. The same harness has been driven end-to-end with both an OpenAI model and an Anthropic model as the agent-under-test. It is an early harness: it runs the clone, agent, and check stages conceptually end-to-end and is not a polished benchmark with published scores.

Agreement

Correctness is anchored to the repo's own tests and the merged human change rather than to a separate human-rater pass, so the ground truth is the production outcome itself. No inter-rater agreement figure is published at this stage.

07Application

Coding-Agent Training

Real multi-year SDLC behavior across many codebases - grounded supervision for agents that read, modify, and ship inside real repos.

SWE Eval Harness

Point-in-time repo snapshots with the linked issue and the human PR that fixed it. Score an agent against what real engineers did.

Provenance-Clean Pretraining

A consented code corpus with surrounding history - distinct from scraped public code of unknown license.

08Environment & integration

How you load it

Delivery

Restricted license / data room, Git bundle export, S3 staging

Formats

JSONL task records, Git bundles (per-repo history), Parquet

Auth

Restricted-access data room under a signed license; per-recipient export. Raw repositories stay gated; only consented, sanitized material is shared.

Cadence

One-time archive or periodic snapshot.

quickstart.sh
# Reference harness flow (repo_evaluator.py)
# 1. Clone the target repo and pin the task's base commit
python repo_evaluator.py org/repo \
--repo-path ./work/repo \
--task tasks/cb-000142.jsonl \
--json --output result.json
 
# Under the hood, per task record:
# clone -> fetch repo from its git bundle
# reconstruct base -> checkout base_commit (state before the fix)
# pose issue -> hand issue_title + issue_body to the agent-under-test
# agent attempts -> agent reads/edits the repo to resolve the issue
# run tests -> execute test_command (execution-based verification)
# compare -> diff the agent change against gold_pr
# verdict -> pass / fail + per-stage report
 
# The agent-under-test is provider-pluggable; runs to date have used both
# an OpenAI model and an Anthropic model. Early harness - end-to-end flow,
# not a published benchmark.

Request access.

Restricted-scope evaluation access for qualified teams. We share real samples, full schema, and provenance under a mutual NDA.