INFERENCE DEMAND DATA
GPU Inference & Token-Demand Intelligence
Real-time inference economics across hosted-model providers - token prices, throughput, and latency as a demand-side read on AI compute.
Overview
What's included
Token Price Index
Input and output token rates per model per provider over time - the unit economics of inference, tracked continuously.
Latency & Throughput
Time-to-first-token, generation throughput, and latency percentiles - congestion signals that move before capacity announcements.
Reliability Signal
Request success and error rates across providers - a real-time read on where inference demand is outrunning supply.
Application
Demand-Side Compute Read
Throughput collapse and rising latency across providers signal surging model demand days to weeks before it shows up in chip orders.
Inference Margin Tracking
Track the falling price of intelligence per token and model the margin structure of hosted-inference and model-API businesses.
Provider Competitive Map
Compare price, speed, and reliability across providers for the same model - who is winning the inference market in real time.
Related research
Request trial access.
90-day trial with restricted scope for evaluation. No commitment required.