CODEBASE TRAINING DATA
Codebase Intelligence
Full-history private codebases - commits, pull requests, reviews, and linked issues - packaged as training data for coding agents and SWE evals.
See the data
Representative records in the exact shape we deliver. Real provenance and full slices are shared under license.
Resolved issue - webhook retry (TypeScript)
Representative shape, not real customer code. fail_to_pass tests fail at base_commit and pass after the gold PR.
{
"task_id": "cb-000142",
"repo": "org/service-api",
"language": "typescript",
"issue_title": "Webhook retries drop events on 5xx",
"issue_body": "When a delivery target returns 5xx, the event is marked failed and never retried. Expected: exponential backoff up to 5 attempts before dead-lettering.",
"base_commit": "889b72b0d9c5c8",
"gold_pr": "PR #318",
"changed_files": ["src/webhook/retry.ts", "src/webhook/retry.test.ts"],
"test_command": "npm test -- retry",
"fail_to_pass": ["retry > backs off on 5xx", "retry > dead-letters after 5 attempts"],
"ci_status": "passed",
"release_tag": "v2.14.0"
}Resolved issue - N+1 query (Python)
Representative shape. The gold PR is withheld from the agent; only the issue and base state are posed.
{
"task_id": "cb-000291",
"repo": "org/billing-worker",
"language": "python",
"issue_title": "Invoice export issues one query per line item",
"issue_body": "Exporting a 500-line invoice fires 500 queries and times out. Batch the line-item fetch into a single query.",
"base_commit": "4f1c0aa7d2",
"gold_pr": "PR #1204",
"changed_files": ["billing/export.py", "tests/test_export.py"],
"test_command": "pytest tests/test_export.py -k batch",
"fail_to_pass": ["test_export_batches_line_items"],
"ci_status": "passed",
"release_tag": null
}Resolved issue - null deref on empty config (Go)
Representative shape. review_thread carries the inline comments that reshaped the merged change.
{
"task_id": "cb-000377",
"repo": "org/edge-proxy",
"language": "go",
"issue_title": "Panic when upstream list is empty",
"issue_body": "Starting the proxy with no upstreams configured panics on first request instead of returning a clear config error at boot.",
"base_commit": "a90b12e4c7",
"gold_pr": "PR #642",
"changed_files": ["proxy/router.go", "proxy/router_test.go"],
"test_command": "go test ./proxy/ -run EmptyUpstream",
"fail_to_pass": ["TestEmptyUpstreamReturnsConfigError"],
"review_thread": "validate at boot, not per-request; return wrapped error",
"ci_status": "passed",
"release_tag": "v0.9.3"
}Record shape
Every field, its type, whether it can be null, and a representative value.
| Field | Type | Constraint | Description |
|---|---|---|---|
| task_id | string | required | Stable identifier for one issue-to-code trace. e.g. cb-000142 |
| repo | string | required | Repository slug the task is drawn from (owner/name). e.g. org/service-api |
| language | string | required | Primary language of the changed files. e.g. typescript |
| issue_title | string | required | Title of the linked issue that motivated the change. e.g. Webhook retries drop events on 5xx |
| issue_body | string | required | Full problem statement posed to the agent. The resolving commits and PR are withheld. |
| base_commit | string | required | SHA of the repo state just before the resolving change. The agent sees only this state. e.g. 889b72b0d9c5c8 |
| gold_pr | string | required | Reference to the human pull request that actually merged and resolved the issue. e.g. PR #318 |
| gold_diff | string | required | Unified diff of the merged change, used as the reference for comparison. |
| changed_files | string[] | required | Files touched by the gold PR. e.g. ["src/webhook/retry.ts"] |
| test_command | string | required | Command that builds and runs the relevant tests for execution-based verification. e.g. npm test -- retry |
| fail_to_pass | string[] | nullable | Tests that fail at base_commit and pass after the gold PR. Empty when not isolable. |
| review_thread | string · json | nullable | Inline review comments and requested changes attached to the gold PR, when present. |
| ci_status | string | nullable | Build and test signal recorded at merge time for the gold PR. e.g. passed |
| release_tag | string | nullable | Release that shipped the change, anchoring it in a version. e.g. v2.14.0 |
Source & Review History
Commits, diffs, pull requests, and review threads with author and repo entity resolution. Real engineering decisions, not synthetic tasks.
Issue-to-Code Linkage
Tickets and project history joined to the commits and PRs that resolved them - end-to-end traces for SWE-agent evals.
Build & Release Signal
CI runs, release tags, and deploy history - the full lifecycle from issue to shipped code.
How it is built
- 01
Consented collection
Repositories are founder-owned and contributed with consent, then sanitized before use. This is what separates the corpus from scraped public code of unknown license.
- 02
Artifact capture
Each repo is captured with its full per-commit diff history plus the surrounding artifacts: pull requests with discussion, code reviews with inline comments, linked issues, CI runs, and release tags.
- 03
Issue-to-code linkage
Tickets are joined to the specific commits and pull requests that resolved them, so each resolved issue becomes an end-to-end trace from a stated problem to the change that shipped, through the review and CI that accompanied it.
- 04
Sanitization and filtering
Material is processed to remove sensitive content prior to inclusion. Traces without a clean issue-to-PR join are not promoted to task records.
- 05
Point-in-time reconstruction
For evaluation use, commit history and release tags recover the repository exactly as it stood just before the resolving change, so an agent sees only what was available then - no look-ahead into the fix.
How we validate
What each evaluation measures and how it is run. Where no benchmark is published, we show the methodology and say so.
Point-in-Time SWE Task
Measures
Whether an agent can resolve a real issue the way a human engineer did - reconstruct the repo at the moment the issue was open, pose the issue, and compare the agent's change against the human PR that actually merged.
Method
Reconstruct state from commit history and release tags; pose the linked issue as the problem statement while withholding the resolving commits and PR; score the agent's output against the gold PR using execution-based verification (run the recorded test command) and a diff comparison to the merged change.
Result
Methodology-stage. No benchmark numbers are published; this is a data and methodology contribution, and quantifying agent performance is future work.
Ground truth
What correct means for this data, and how it is established.
Ground truth
The human pull request that actually merged and resolved the linked issue - the change that passed the repo's own review and CI - together with the repository's existing test suite at that point in time.
How it is established
Execution-based verification plus diff comparison. A reference harness (repo_evaluator.py) clones the target repo, reconstructs the base commit, runs the agent-under-test against the posed issue, then runs the recorded test command and compares the resulting change to the gold PR. The same harness has been driven end-to-end with both an OpenAI model and an Anthropic model as the agent-under-test. It is an early harness: it runs the clone, agent, and check stages conceptually end-to-end and is not a polished benchmark with published scores.
Agreement
Correctness is anchored to the repo's own tests and the merged human change rather than to a separate human-rater pass, so the ground truth is the production outcome itself. No inter-rater agreement figure is published at this stage.
Coding-Agent Training
Real multi-year SDLC behavior across many codebases - grounded supervision for agents that read, modify, and ship inside real repos.
SWE Eval Harness
Point-in-time repo snapshots with the linked issue and the human PR that fixed it. Score an agent against what real engineers did.
Provenance-Clean Pretraining
A consented code corpus with surrounding history - distinct from scraped public code of unknown license.
How you load it
Delivery
Restricted license / data room, Git bundle export, S3 staging
Formats
JSONL task records, Git bundles (per-repo history), Parquet
Auth
Restricted-access data room under a signed license; per-recipient export. Raw repositories stay gated; only consented, sanitized material is shared.
Cadence
One-time archive or periodic snapshot.
# Reference harness flow (repo_evaluator.py)# 1. Clone the target repo and pin the task's base commitpython repo_evaluator.py org/repo \--repo-path ./work/repo \--task tasks/cb-000142.jsonl \--json --output result.json# Under the hood, per task record:# clone -> fetch repo from its git bundle# reconstruct base -> checkout base_commit (state before the fix)# pose issue -> hand issue_title + issue_body to the agent-under-test# agent attempts -> agent reads/edits the repo to resolve the issue# run tests -> execute test_command (execution-based verification)# compare -> diff the agent change against gold_pr# verdict -> pass / fail + per-stage report# The agent-under-test is provider-pluggable; runs to date have used both# an OpenAI model and an Anthropic model. Early harness - end-to-end flow,# not a published benchmark.
Request access.
Restricted-scope evaluation access for qualified teams. We share real samples, full schema, and provenance under a mutual NDA.