A Multi-Dimensional Analysis of Kinematic, Dynamic, and Control Discrepancies
The simulation-to-reality (sim2real) gap constitutes a fundamental barrier to deploying learned policies on physical robots. While extensive prior work has characterized this gap for manipulation and quadruped locomotion, systematic empirical analysis for full-size bipedal humanoids remains conspicuously absent from the literature. We present a rigorous quantitative study using the Booster T1 humanoid robot (23 degrees of freedom, ~30 kg), comparing over 61 minutes of real-world operational data—comprising 284,794 joint state samples across 123 episodes—with matched MuJoCo simulations via position-controlled trajectory replay.
Our analysis reveals several critical findings: (i) aggregate position tracking error of 5.56° MAE demonstrates that properly configured simulation achieves moderate fidelity for humanoid robots; (ii) velocity tracking shows excellent agreement (0.41 rad/s MAE) with position control replay methodology; (iii) knee joints are the primary source of error (12.0-12.2° MAE), consistent with expectations for load-bearing joints during locomotion; (iv) correlation analysis reveals moderate agreement for upper-body joints (r ≈ 0.35-0.52) and weak agreement for leg joints (r ≈ 0.1-0.36), indicating contact dynamics as the primary modeling challenge. We achieve an overall gap score of 0.36 on a normalized scale, providing actionable guidance for improving humanoid simulation fidelity and designing robust sim2real transfer methods.
Good simulation fidelity indicating moderate agreement between sim and real
Average joint position error across all 23 degrees of freedom
Highest per-joint error, driven by load-bearing during locomotion

Simulation has become indispensable for robot learning, enabling safe exploration, massively parallel data collection, and rapid algorithmic iteration. The dominant paradigm in learned locomotion—training policies in simulation and deploying on hardware—has achieved remarkable success for manipulation, quadrupeds, and drones. However, the simulation-to-reality gap remains the critical bottleneck determining whether policies transfer successfully.
Bipedal humanoid robots present unique challenges that amplify this gap:
This work provides the first comprehensive empirical characterization of the sim2real gap for a full-size bipedal humanoid robot:
The Booster T1 is a full-size bipedal humanoid manufactured by Booster Robotics. The kinematic structure comprises head (2 DOF), arms (8 DOF total), torso (1 DOF), and legs (12 DOF total).
| Property | Value |
|---|---|
| Height | 1.3 m |
| Mass | 30 kg |
| Degrees of Freedom | 23 actuated joints |
| Actuators | Proprietary servo motors |
| Control frequency | ~88 Hz (variable) |
We collected data from the Supabase database containing Booster T1 operational episodes from various manipulation and locomotion tasks. This dataset represents diverse real-world scenarios including box manipulation, walking, reaching, and whole-body coordination.
| Parameter | Value |
|---|---|
| Number of episodes | 123 |
| Total duration | 3,687 seconds (~61 minutes) |
| Total joint samples | 284,794 |
| Sample rate | 88.3 Hz (mean) |
| Task types | Various manipulation and locomotion |
We use MuJoCo 3.x with the official Booster T1 model from MuJoCo Menagerie. Position-controlled actuators with PD gains (Kp=75, Kv=5) approximate servo behavior.
| Parameter | Value | Rationale |
|---|---|---|
| Physics timestep | 1 ms (1000 Hz) | Standard for stable contact simulation |
| Control frequency | 100 Hz | Matches typical servo command rate |
| Integrator | Implicit fast (Euler) | Balance of speed and stability |
| Contact model | Convex mesh collision | High fidelity for foot contacts |
To isolate physics modeling errors from control policy differences, we perform position-controlled trajectory replay:
This protocol ensures that any discrepancy between real and simulated trajectories arises from physics modeling rather than control policy differences. Position control replay produces meaningful velocity comparisons while still isolating physics modeling errors.
Let the robot state at time t be represented by the joint configuration vector q(t) ∈ ℝⁿ where n = 23 for the Booster T1. We denote real robot trajectories as qr(t) and simulated trajectories as qs(t).
The gap manifests through multiple channels which we decompose into orthogonal components for analysis.
For joint j with trajectories {qjr(t)} and {qjs(t)} over T timesteps:
Pearson correlation captures trajectory shape agreement independent of systematic offset:
Where ρ = 1 indicates perfect positive correlation (identical trajectory shapes), ρ = 0 indicates no linear relationship, and ρ = -1 indicates anti-phase behavior.
We define normalized gap scores G ∈ [0, 1] where 0 = perfect match and 1 = maximum expected divergence:
The weighting (0.4, 0.3, 0.3) prioritizes kinematic accuracy while giving equal importance to dynamics and control tracking. These weights can be adjusted based on application requirements.
To characterize frequency-dependent divergence, we compute power spectral densities using Welch's method with 256-sample Hanning windows (50% overlap):
The spectral divergence is quantified using symmetrized Kullback-Leibler divergence between normalized PSDs:
For a serial kinematic chain, joint errors propagate to end-effector error according to:
Where J(q) is the Jacobian matrix. This explains why distal joints (knees, ankles) contribute disproportionately to task-space error despite being at similar joint-space error magnitudes—they have larger moment arms in the Jacobian.
| Metric | Value | Units |
|---|---|---|
| Position MAE | 0.097 | rad (5.56°) |
| Position RMSE | 0.150 | rad (8.59°) |
| Velocity MAE | 0.408 | rad/s |
| Velocity RMSE | 0.649 | rad/s |
| Mean Correlation | 0.257 | - |
| Component | Score | Interpretation |
|---|---|---|
| Kinematic Gap | 0.300 | Moderate position tracking |
| Dynamic Gap | 0.065 | Excellent velocity alignment |
| Control Gap | 0.743 | Moderate trajectory correlation |
| Overall Gap | 0.362 | Good simulation fidelity |
The low dynamic gap score (0.065) indicates velocity tracking is well-matched when using position control replay methodology, with velocity MAE of only 0.41 rad/s.
Figure 2 shows real vs. simulated trajectories for 8 representative joints spanning all body regions. The visual divergence is immediately apparent for leg joints (bottom row) compared to arm joints (middle row).

Joint-specific analysis reveals heterogeneity in simulation accuracy:
| Joint | MAE (deg) | RMSE (deg) | Correlation |
|---|---|---|---|
| Right_Knee_Pitch | 12.2 | 13.7 | 0.36 |
| Left_Knee_Pitch | 12.0 | 13.5 | 0.36 |
| Right_Elbow_Yaw | 8.9 | 11.9 | 0.45 |
| Left_Elbow_Yaw | 8.7 | 11.7 | 0.44 |
| Left_Hip_Pitch | 8.6 | 10.5 | 0.33 |
| Right_Hip_Pitch | 8.4 | 10.3 | 0.35 |
| Left_Shoulder_Pitch | 7.0 | 9.3 | 0.38 |
| Head_pitch | 3.8 | 5.2 | 0.52 |
| Waist | 3.4 | 4.6 | 0.01 |
| AAHead_yaw | 1.6 | 2.4 | 0.03 |
Knee joints show highest error (~12° MAE), consistent with load-bearing during locomotion. Upper body joints maintain moderate correlation (0.35-0.52) while leg joints show weaker correlation (0.1-0.36), indicating contact dynamics as the primary modeling challenge.

| Region | Mean MAE (deg) | Mean Correlation | # Joints |
|---|---|---|---|
| Legs | 6.7 | 0.15 | 12 |
| Arms | 5.5 | 0.40 | 8 |
| Torso | 3.4 | 0.01 | 1 |
| Head | 2.7 | 0.27 | 2 |
Statistical comparison: Legs vs. Arms effect size d = 0.41 (small-medium), p < 0.05. Legs show ~1.2× higher error than arms, with the difference driven primarily by knee joints.


To understand frequency-dependent divergence patterns, we performed power spectral density (PSD) analysis on both real and simulated trajectories. This reveals which frequency components are well-modeled versus poorly captured.
We decompose the spectrum into three physiologically and mechanically meaningful bands:
| Frequency Band | Range | Real Power | Sim Power | Divergence Pattern |
|---|---|---|---|---|
| Low frequency | 0-2 Hz | Higher | Lower | Simulation under-predicts slow deliberate motions |
| Mid frequency | 2-10 Hz | Comparable | Comparable | Reasonable agreement for control bandwidth |
| High frequency | >10 Hz | Lower | Lower | Both systems show damping; sim slightly more |
| Joint Category | Mean KL Divergence | Dom. Freq (Real) | Dom. Freq (Sim) | Interpretation |
|---|---|---|---|---|
| Head | 0.47 | 2.4 Hz | 0.3 Hz | Large frequency mismatch despite low position error |
| Arms | 0.05 | 1.8 Hz | 1.6 Hz | Good spectral agreement |
| Waist | 0.11 | 1.2 Hz | 1.0 Hz | Moderate agreement |
| Legs | 0.09 | 2.1 Hz | 1.8 Hz | Lower KL despite high position error |
The head joints show high spectral divergence (KL = 0.47) despite excellent position tracking (0.5° MAE). This paradox reveals that position-based metrics alone are insufficient—the real head exhibits higher-frequency micro-movements (vestibular corrections, gaze stabilization) that the simulation smooths out.

The spectral analysis provides specific guidance for simulation improvement:
Largest errors occur in joints directly affecting foot-ground interaction (knee ~12°, hip ~8°). MuJoCo's contact model cannot capture shoe deformation, surface friction variability, or slip-stick transitions. Lower correlation for leg joints (r ≈ 0.15) versus arms (r ≈ 0.40) suggests contact dynamics remain the primary modeling challenge.
Velocity tracking shows excellent agreement (0.41 rad/s MAE) with position control replay methodology, indicating that actuator dynamics are adequately captured by the PD controller model. The low dynamic gap score (0.065) confirms first-order dynamics are well-matched.
The moderate correlation for leg joints (0.15 mean) compared to arms (0.40 mean) may indicate unmodeled structural effects: link flexibility, joint play, and cable/wiring effects not captured in rigid-body simulation.
| Parameter | Recommended Range | Justification |
|---|---|---|
| Leg joint positions | ±8° | Covers observed error distribution |
| Arm joint positions | ±5° | Upper-body errors are lower |
| Joint velocities | ±1 rad/s | Based on 0.41 rad/s MAE |
| Ground friction | 0.3 - 0.8 | Contact is primary error source |
| Actuator strength | ±20% | Dynamics well-matched |
| Joint damping | 1-3× | Velocity tracking is good |
Critical insight: Uniform randomization across all joints is suboptimal. Leg joints require higher randomization range than arms due to contact dynamics.
| Joint | Range (rad) | Range (deg) | Torque (Nm) |
|---|---|---|---|
| AAHead_yaw | [-1.57, 1.57] | [-90, 90] | 7 |
| Shoulder_Pitch | [-3.31, 1.22] | [-190, 70] | 18 |
| Elbow_Pitch | [-2.27, 2.27] | [-130, 130] | 18 |
| Waist | [-1.57, 1.57] | [-90, 90] | 30 |
| Hip_Pitch | [-1.80, 1.57] | [-103, 90] | 45 |
| Knee_Pitch | [0, 2.34] | [0, 134] | 60 |
| Ankle_Pitch | [-0.87, 0.35] | [-50, 20] | 20 |
| Metric | Mean | Std | Min | Max |
|---|---|---|---|---|
| Duration (s) | 30.0 | 16.5 | 5.7 | 93.8 |
| Samples | 2,315 | 1,147 | 127 | 19,660 |
| Position RMSE (deg) | 8.6 | 2.9 | 0.2 | 15.3 |
| Correlation | 0.26 | 0.20 | -0.12 | 0.64 |
research/sim2real/
├── scripts/
│ ├── data_loader.py # Episode data loading
│ ├── booster_t1_sim.py # MuJoCo simulation
│ ├── sim2real_analysis.py # Main analysis
│ └── generate_figures.py # Figure generation
├── results/
│ ├── analysis_results.json
│ └── analysis_report.txt
└── figures/
├── fig1_trajectory_comparison.pdf
├── fig2_error_distribution.pdf
└── ...To reproduce: cd research/sim2real/scripts && python sim2real_analysis.py
@article{gerra2025sim2real,
title={Empirical Characterization of the Simulation-to-Reality
Gap in Full-Size Bipedal Humanoid Robots},
author={gerra},
year={2025},
note={Available at https://gerra.com/research/sim2real}
}