SPORTS + FINANCE NLP
Sports Intelligence
The only sports data source that powers real-time AI tool calls for the largest language models.
See the data
Representative records in the exact shape we deliver. Real provenance and full slices are shared under license.
Table-type, leaderboard
Representative shape, not real data. Cell values are strings; column and row are order-aligned; row_count equals the number of rows.
{
"sport": "nba",
"query": "who leads the nba in scoring this season",
"type": "table",
"column": ["RANK", "NAME", "PPG", "SEASON", "TM", "GP"],
"row": [
["1", "Player A", "31.4", "2025-26", "XYZ", "44"],
["2", "Player B", "30.2", "2025-26", "ABC", "49"]
],
"row_count": 2
}Answer-type, single value
Representative. On answer-type, answer is populated and column / row / row_count are absent.
{
"sport": "mlb",
"query": "who won the world series last year",
"type": "answer",
"answer": "Team X won the most recent World Series."
}Record shape
Every field, its type, whether it can be null, and a representative value.
| Field | Type | Constraint | Description |
|---|---|---|---|
| sport | string | required | Sport category: nba, nfl, mlb, nhl, wnba, cfb, pga, fc. e.g. nba |
| query | string | required | The natural-language query as posed. e.g. who leads the nba in scoring this season |
| type | string | required | Response type: table (structured rows) or answer (single value). e.g. table |
| column | string[] | nullable | Column headers, order-aligned to each row. Null on answer-type. e.g. ["RANK","NAME","PPG"] |
| row | string[][] | nullable | 2D row data; each inner array matches column order. Null on answer-type. e.g. [["1","Player A","31.4"]] |
| row_count | int · rows | nullable | Number of data rows returned. Null on answer-type. e.g. 25 |
| answer | string | nullable | Natural-language answer for single-value responses. Present on answer-type only. e.g. Team X won the most recent title. |
NLP Query Layer
Natural language in, structured data out. The conversational query layer that powers live AI tool calls.
Historical Stats
All major sports, all historical seasons, all available statistics. Complete structured coverage.
Finance Overlay
Sports performance data mapped to publicly traded tickers (DKNG, FLUT, DIS, apparel brands).
How it is built
- 01
NL query capture
The dataset is a corpus of natural-language sports questions paired with their resolved structured responses. The unit of data is the query-to-structured-answer pair, not a raw stats table.
- 02
Structured resolution
Each query resolves to either a tabular result (column, row, row_count) or a single-value answer. Observed intents include leaderboard, career, season, standings, and historical.
- 03
Sport and stat coverage
Eight sports across all historical seasons and all available statistics. Per-sport stat vocabularies differ - an NBA leaderboard carries many more columns than a standings table.
- 04
Normalization to a common envelope
Heterogeneous stat responses across sports are wrapped in one consistent record envelope so consumers parse every sport identically.
- 05
Delivery and refresh
Served real-time during live events or as a daily batch.
How we validate
What each evaluation measures and how it is run. Where no benchmark is published, we show the methodology and say so.
NL-to-structured resolution accuracy
Measures
Whether a query resolves to the correct structured answer - right stat, right entities, right ranking.
Method
A held-out set of NL queries scored against ground-truth stat tables, with exact-match on returned rows and values and column alignment.
Result
Methodology-stage. Not yet run or published.
Cross-sport schema consistency
Measures
Whether the record envelope parses uniformly across all eight sports.
Method
A schema-validation pass over records per sport: envelope conformance, column and row alignment, and row_count equal to the number of rows.
Result
Methodology-stage. No published figure.
Ground truth
What correct means for this data, and how it is established.
Ground truth
Verified official statistics for the queried season and entity - the canonical numeric stats a query should resolve to.
How it is established
Compare the returned rows or answer against ground-truth stats by exact value match with column-order alignment; for answer-type, value match against the known fact.
Agreement
No grader implementation or agreement figure is published at this stage.
Sports Betting Intelligence
Query volume patterns reveal where attention and money flow before lines move. Track which matchups and players drive the most analytical interest.
Ticker-Mapped Performance
Correlate sports outcomes and engagement with stock performance of gaming, media, and apparel companies.
AI Model Training
The same NLP-to-structured-data pipeline that powers major AI models. Licensed for model training with full historical coverage.
How you load it
Delivery
S3, REST API, WebSocket
Formats
JSON, CSV, Parquet
Auth
Licensed access. Derived natural-language and structured data.
Cadence
Real-time during live events, or daily batch.
Request access.
Restricted-scope evaluation access for qualified teams. We share real samples, full schema, and provenance under a mutual NDA.