SPORTS + FINANCE NLP

Sports Intelligence

The only sports data source that powers real-time AI tool calls for the largest language models.

12 YEARSJSON · CSV · ParquetReal-time (during live events) or daily batch
1B+
NL queries
8
Sports covered
12 yrs
History
2
Response shapes
01Sample

See the data

Representative records in the exact shape we deliver. Real provenance and full slices are shared under license.

Table-type, leaderboard

Representative shape, not real data. Cell values are strings; column and row are order-aligned; row_count equals the number of rows.

queries.jsonlrepresentative
{
  "sport": "nba",
  "query": "who leads the nba in scoring this season",
  "type": "table",
  "column": ["RANK", "NAME", "PPG", "SEASON", "TM", "GP"],
  "row": [
    ["1", "Player A", "31.4", "2025-26", "XYZ", "44"],
    ["2", "Player B", "30.2", "2025-26", "ABC", "49"]
  ],
  "row_count": 2
}

Answer-type, single value

Representative. On answer-type, answer is populated and column / row / row_count are absent.

queries.jsonlrepresentative
{
  "sport": "mlb",
  "query": "who won the world series last year",
  "type": "answer",
  "answer": "Team X won the most recent World Series."
}
02Schema

Record shape

Every field, its type, whether it can be null, and a representative value.

FieldTypeConstraintDescription
sportstringrequiredSport category: nba, nfl, mlb, nhl, wnba, cfb, pga, fc.
e.g. nba
querystringrequiredThe natural-language query as posed.
e.g. who leads the nba in scoring this season
typestringrequiredResponse type: table (structured rows) or answer (single value).
e.g. table
columnstring[]nullableColumn headers, order-aligned to each row. Null on answer-type.
e.g. ["RANK","NAME","PPG"]
rowstring[][]nullable2D row data; each inner array matches column order. Null on answer-type.
e.g. [["1","Player A","31.4"]]
row_countint · rowsnullableNumber of data rows returned. Null on answer-type.
e.g. 25
answerstringnullableNatural-language answer for single-value responses. Present on answer-type only.
e.g. Team X won the most recent title.
03What's included

NLP Query Layer

Natural language in, structured data out. The conversational query layer that powers live AI tool calls.

Historical Stats

All major sports, all historical seasons, all available statistics. Complete structured coverage.

Finance Overlay

Sports performance data mapped to publicly traded tickers (DKNG, FLUT, DIS, apparel brands).

04Methodology

How it is built

  1. 01

    NL query capture

    The dataset is a corpus of natural-language sports questions paired with their resolved structured responses. The unit of data is the query-to-structured-answer pair, not a raw stats table.

  2. 02

    Structured resolution

    Each query resolves to either a tabular result (column, row, row_count) or a single-value answer. Observed intents include leaderboard, career, season, standings, and historical.

  3. 03

    Sport and stat coverage

    Eight sports across all historical seasons and all available statistics. Per-sport stat vocabularies differ - an NBA leaderboard carries many more columns than a standings table.

  4. 04

    Normalization to a common envelope

    Heterogeneous stat responses across sports are wrapped in one consistent record envelope so consumers parse every sport identically.

  5. 05

    Delivery and refresh

    Served real-time during live events or as a daily batch.

05Evals

How we validate

What each evaluation measures and how it is run. Where no benchmark is published, we show the methodology and say so.

NL-to-structured resolution accuracy

Measures

Whether a query resolves to the correct structured answer - right stat, right entities, right ranking.

Method

A held-out set of NL queries scored against ground-truth stat tables, with exact-match on returned rows and values and column alignment.

Result

Methodology-stage. Not yet run or published.

Cross-sport schema consistency

Measures

Whether the record envelope parses uniformly across all eight sports.

Method

A schema-validation pass over records per sport: envelope conformance, column and row alignment, and row_count equal to the number of rows.

Result

Methodology-stage. No published figure.

06Graders

Ground truth

What correct means for this data, and how it is established.

Ground truth

Verified official statistics for the queried season and entity - the canonical numeric stats a query should resolve to.

How it is established

Compare the returned rows or answer against ground-truth stats by exact value match with column-order alignment; for answer-type, value match against the known fact.

Agreement

No grader implementation or agreement figure is published at this stage.

07Application

Sports Betting Intelligence

Query volume patterns reveal where attention and money flow before lines move. Track which matchups and players drive the most analytical interest.

Ticker-Mapped Performance

Correlate sports outcomes and engagement with stock performance of gaming, media, and apparel companies.

AI Model Training

The same NLP-to-structured-data pipeline that powers major AI models. Licensed for model training with full historical coverage.

08Environment & integration

How you load it

Delivery

S3, REST API, WebSocket

Formats

JSON, CSV, Parquet

Auth

Licensed access. Derived natural-language and structured data.

Cadence

Real-time during live events, or daily batch.

Request access.

Restricted-scope evaluation access for qualified teams. We share real samples, full schema, and provenance under a mutual NDA.