INFERENCE DEMAND DATA

GPU Inference & Token-Demand Intelligence

Real-time inference economics across hosted-model providers - token prices, throughput, and latency as a demand-side read on AI compute.

1 YEARParquet, JSON, CSVContinuous polling, daily aggregates

Overview

ProvidersTogether, Fireworks, DeepInfra, Replicate...
Model familiesLlama, Qwen, Mixtral, DeepSeek
MetricsToken price, TTFT, throughput, latency, success rate
SignalDemand-side compute read
FormatParquet / API
DeliveryS3, REST API, Parquet

What's included

Token Price Index

Input and output token rates per model per provider over time - the unit economics of inference, tracked continuously.

Latency & Throughput

Time-to-first-token, generation throughput, and latency percentiles - congestion signals that move before capacity announcements.

Reliability Signal

Request success and error rates across providers - a real-time read on where inference demand is outrunning supply.

Application

Demand-Side Compute Read

Throughput collapse and rising latency across providers signal surging model demand days to weeks before it shows up in chip orders.

Inference Margin Tracking

Track the falling price of intelligence per token and model the margin structure of hosted-inference and model-API businesses.

Provider Competitive Map

Compare price, speed, and reliability across providers for the same model - who is winning the inference market in real time.

Request trial access.

90-day trial with restricted scope for evaluation. No commitment required.