INFERENCE DEMAND DATA

GPU Inference & Token-Demand Intelligence

Real-time inference economics across hosted-model providers - token prices, throughput, and latency as a demand-side read on AI compute.

1 YEARParquet · JSON · CSVContinuous polling, daily aggregates
488
Token-price observations
4
Providers
~10x
Output price decline
Roadmap
Demand-side metrics
01Sample

See the data

Representative records in the exact shape we deliver. Real provenance and full slices are shared under license.

Current collected record (output price)

Representative. The study measures output price per million tokens on cheapest-available models; tiers are not pooled.

inference_prices.jsonlrepresentative
{
  "observed_at": "2026-04-02T00:00:00Z",
  "provider": "provider-a",
  "model_family": "llama",
  "model_listing": "llama-3.1-8b-instruct",
  "output_price_per_mtok_usd": 0.02,
  "input_price_per_mtok_usd": 0.01,
  "model_tier": "small_open"
}

Proposed demand-side record (not yet collected)

Representative of the proposed schema only. TTFT, throughput, and success rate are not present in the current 488-observation sample - shown here as planned, not collected.

inference_demand.jsonlrepresentative
{
  "observed_at": "2026-04-02T00:05:00Z",
  "provider": "provider-b",
  "model_family": "qwen",
  "ttft_ms": null,
  "throughput_tok_s": null,
  "success_rate": null,
  "_status": "demand-side metrics not yet collected"
}
02Schema

Record shape

Every field, its type, whether it can be null, and a representative value.

FieldTypeConstraintDescription
observed_attimestamp · UTCrequiredWhen the price was collected from the provider pricing surface.
e.g. 2026-04-02T00:00:00Z
providerstringrequiredOne of four hosted-inference providers.
e.g. provider-a
model_familystringrequiredOpen-weight lineage the listing is grouped under (Llama, Qwen, Mixtral, DeepSeek).
e.g. llama
model_listingstringnullableProvider raw model product name before family grouping.
e.g. llama-3.1-8b-instruct
output_price_per_mtok_usdfloat · USD / 1M output tokensrequiredNormalized output-token price - the quantity the study actually measures.
e.g. 0.02
input_price_per_mtok_usdfloat · USD / 1M input tokensnullableInput-token price where listed.
e.g. 0.01
model_tierstringnullableCoarse tier flag; the study does not pool tiers.
e.g. small_open
ttft_msfloat · msnullableTime-to-first-token. Proposed demand-side metric, not yet collected.
e.g. null
throughput_tok_sfloat · tokens/secnullableGeneration throughput. Proposed demand-side metric, not yet collected.
e.g. null
success_ratefloat · 0..1nullableRequest success rate under load. Proposed demand-side metric, not yet collected.
e.g. null
03What's included

Token Price Index

Input and output token rates per model per provider over time - the unit economics of inference, tracked continuously.

Latency & Throughput

Time-to-first-token, generation throughput, and latency percentiles - congestion signals that move before capacity announcements.

Reliability Signal

Request success and error rates across providers - a real-time read on where inference demand is outrunning supply.

04Methodology

How it is built

  1. 01

    Collection

    Collect from the public pricing surfaces of four hosted-inference providers.

  2. 02

    Normalization

    Normalize to a per-million-token basis for output tokens.

  3. 03

    Family grouping

    Group heterogeneous model listings by the underlying open-weight family so comparisons stay within recognizable lineages.

  4. 04

    Tier separation

    Observed models span small open-weight checkpoints to frontier-scale hosted deployments priced very differently; tiers are not pooled into a single level.

  5. 05

    Direction-over-level reporting

    Report the direction - a steep decline at the cheap end - as robust, and treat absolute price levels as preliminary because the cheapest quote in any period may reflect a different tier than in another.

05Evals

How we validate

What each evaluation measures and how it is run. Where no benchmark is published, we show the methodology and say so.

Cheapest-output price decline

Measures

How the price of the cheapest available output tokens moved over time.

Method

Track the cheapest available output price per million tokens across the four providers over the window.

Result

Descriptive, published as preliminary: fell from about $0.13 per million tokens in mid-2024 into the $0.01 to $0.03 range in 2025-2026. Direction robust; absolute level preliminary due to mixed tiers.

Demand-side leading indicator

Measures

Whether time-to-first-token, throughput, and success rate lead capacity tightening ahead of posted prices.

Method

The hypothesis is stated; the metrics require inference-provider API access that is not yet wired up.

Result

Methodology-stage. Explicitly not built and not collected. The demand-side fields are a roadmap, shown in the schema as not-yet-collected.

06Graders

Ground truth

What correct means for this data, and how it is established.

Ground truth

For the descriptive layer, the observed posted output-token prices across the four providers on a cheapest-available basis. For the proposed demand signal, there is no ground truth yet because the data is uncollected.

How it is established

Descriptive aggregation of the cheapest-available output price over time per provider and family. No predictive grader exists; the proposed demand-side validation is future work gated on API access.

07Application

Demand-Side Compute Read

Throughput collapse and rising latency across providers signal surging model demand days to weeks before it shows up in chip orders.

Inference Margin Tracking

Track the falling price of intelligence per token and model the margin structure of hosted-inference and model-API businesses.

Provider Competitive Map

Compare price, speed, and reliability across providers for the same model - who is winning the inference market in real time.

08Environment & integration

How you load it

Delivery

S3, REST API, Parquet

Formats

Parquet, JSON, CSV

Auth

Licensed for internal research and model development. Sourced from public pricing surfaces; no PII or MNPI.

Cadence

Continuous polling with daily aggregates.

Request access.

Restricted-scope evaluation access for qualified teams. We share real samples, full schema, and provenance under a mutual NDA.