RESEARCH PAPER

Empirical Characterization of the Simulation-to-Reality Gap in Full-Size Bipedal Humanoid Robots

A Multi-Dimensional Analysis of Kinematic, Dynamic, and Control Discrepancies

StatusPublished
DateJanuary 2025
Authorsgerra
PlatformBooster T1
1

Abstract

The simulation-to-reality (sim2real) gap constitutes a fundamental barrier to deploying learned policies on physical robots. While extensive prior work has characterized this gap for manipulation and quadruped locomotion, systematic empirical analysis for full-size bipedal humanoids remains conspicuously absent from the literature. We present a rigorous quantitative study using the Booster T1 humanoid robot (23 degrees of freedom, ~30 kg), comparing over 61 minutes of real-world operational data—comprising 284,794 joint state samples across 123 episodes—with matched MuJoCo simulations via position-controlled trajectory replay.

Our analysis reveals several critical findings: (i) aggregate position tracking error of 5.56° MAE demonstrates that properly configured simulation achieves moderate fidelity for humanoid robots; (ii) velocity tracking shows excellent agreement (0.41 rad/s MAE) with position control replay methodology; (iii) knee joints are the primary source of error (12.0-12.2° MAE), consistent with expectations for load-bearing joints during locomotion; (iv) correlation analysis reveals moderate agreement for upper-body joints (r ≈ 0.35-0.52) and weak agreement for leg joints (r ≈ 0.1-0.36), indicating contact dynamics as the primary modeling challenge. We achieve an overall gap score of 0.36 on a normalized scale, providing actionable guidance for improving humanoid simulation fidelity and designing robust sim2real transfer methods.

0.36
Overall Gap Score

Good simulation fidelity indicating moderate agreement between sim and real

5.56°
Position MAE

Average joint position error across all 23 degrees of freedom

12.1°MAE
Knee Error

Highest per-joint error, driven by load-bearing during locomotion

Composite Gap Score Decomposition

0.30
Kinematic
0.07
Dynamic
0.74
Control
0.36
Overall
Summary metrics showing overall gap scores and error distributions
Figure 1: Summary dashboard of sim2real gap analysis. Left: Composite gap scores decomposed into kinematic (0.300), dynamic (0.065), and control (0.743) components. Center: Aggregate position and velocity error metrics. Right: Dataset statistics including 3,687 seconds of data across 284,794 samples.
2

Introduction

2.1 Motivation

Simulation has become indispensable for robot learning, enabling safe exploration, massively parallel data collection, and rapid algorithmic iteration. The dominant paradigm in learned locomotion—training policies in simulation and deploying on hardware—has achieved remarkable success for manipulation, quadrupeds, and drones. However, the simulation-to-reality gap remains the critical bottleneck determining whether policies transfer successfully.

Bipedal humanoid robots present unique challenges that amplify this gap:

  • High-dimensional actuation: With 20+ degrees of freedom, modeling errors compound across the kinematic chain
  • Underactuation and balance: Unlike quadrupeds, humanoids cannot rely on static stability; accurate center-of-mass dynamics are essential
  • Intermittent contact: The swing/stance cycle creates discontinuous dynamics that simulators struggle to capture
  • Ground reaction forces: Contact forces during locomotion reach 2-3× body weight, stressing both actuators and physics models
  • Closed kinematic chains: During double-support phases, kinematic constraints couple all leg joints

2.2 Contributions

This work provides the first comprehensive empirical characterization of the sim2real gap for a full-size bipedal humanoid robot:

  1. Large-Scale Dataset: Over 61 minutes of real-world operational data (284,794 joint samples at ~88 Hz across 123 episodes)
  2. Position-Controlled Replay Protocol: Rigorous trajectory replay in MuJoCo using PD control, isolating physics modeling errors from control policy differences
  3. Multi-Dimensional Gap Decomposition: Characterization across kinematic (position), dynamic (velocity), and control (correlation, spectral) dimensions
  4. Joint-Specific Analysis: Detailed per-joint breakdown revealing knee and hip pitch joints as primary error contributors
  5. Actionable Insights: Empirically-grounded recommendations for domain randomization ranges and system identification priorities

2.3 Scope and Limitations

In Scope

  • Position-controlled replay comparison
  • Statistical characterization of tracking errors
  • Frequency-domain analysis
  • Per-joint and per-body-region decomposition

Out of Scope

  • Closed-loop policy transfer experiments
  • Contact force ground truth
  • Generalization to other tasks
  • Cross-simulator comparisons
4

Experimental Setup & Methodology

4.1 Robot Platform

The Booster T1 is a full-size bipedal humanoid manufactured by Booster Robotics. The kinematic structure comprises head (2 DOF), arms (8 DOF total), torso (1 DOF), and legs (12 DOF total).

PropertyValue
Height1.3 m
Mass30 kg
Degrees of Freedom23 actuated joints
ActuatorsProprietary servo motors
Control frequency~88 Hz (variable)

4.2 Data Collection

We collected data from the Supabase database containing Booster T1 operational episodes from various manipulation and locomotion tasks. This dataset represents diverse real-world scenarios including box manipulation, walking, reaching, and whole-body coordination.

ParameterValue
Number of episodes123
Total duration3,687 seconds (~61 minutes)
Total joint samples284,794
Sample rate88.3 Hz (mean)
Task typesVarious manipulation and locomotion

4.3 Simulation Environment

We use MuJoCo 3.x with the official Booster T1 model from MuJoCo Menagerie. Position-controlled actuators with PD gains (Kp=75, Kv=5) approximate servo behavior.

ParameterValueRationale
Physics timestep1 ms (1000 Hz)Standard for stable contact simulation
Control frequency100 HzMatches typical servo command rate
IntegratorImplicit fast (Euler)Balance of speed and stability
Contact modelConvex mesh collisionHigh fidelity for foot contacts

4.4 Trajectory Replay Protocol

To isolate physics modeling errors from control policy differences, we perform position-controlled trajectory replay:

  1. Initialization: Set simulation joint angles to match the first recorded configuration
  2. Replay loop: At each control timestep (100 Hz), set position targets from recorded trajectory; simulation uses PD controller to track positions
  3. Physics step: Advance simulation for one control period (10 physics steps at 1000 Hz), producing meaningful velocity dynamics
  4. Recording: Capture resulting simulated joint states for comparison

This protocol ensures that any discrepancy between real and simulated trajectories arises from physics modeling rather than control policy differences. Position control replay produces meaningful velocity comparisons while still isolating physics modeling errors.

5

Mathematical Framework

5.1 Problem Formulation

Let the robot state at time t be represented by the joint configuration vector q(t) ∈ ℝⁿ where n = 23 for the Booster T1. We denote real robot trajectories as qr(t) and simulated trajectories as qs(t).

Definition: Sim2Real Gap
The sim2real gap Δ(t) is defined as the instantaneous discrepancy between simulated and real states:
Δ(t) = qs(t) - qr(t) ∈ ℝⁿ
(1)

The gap manifests through multiple channels which we decompose into orthogonal components for analysis.

5.2 Position Tracking Metrics

For joint j with trajectories {qjr(t)} and {qjs(t)} over T timesteps:

MAEj = (1/T) Σt=1T |qjs(t) - qjr(t)|
(2)
RMSEj = √[(1/T) Σt=1T (qjs(t) - qjr(t))²]
(3)
MaxErrj = maxt |qjs(t) - qjr(t)|
(4)

5.3 Correlation Analysis

Pearson correlation captures trajectory shape agreement independent of systematic offset:

ρj = Cov(qjs, qjr) / (σqjs · σqjr)
(5)

Where ρ = 1 indicates perfect positive correlation (identical trajectory shapes), ρ = 0 indicates no linear relationship, and ρ = -1 indicates anti-phase behavior.

Proposition 1: Correlation-Error Decomposition
Low correlation (ρ ≈ 0) combined with high MAE indicates fundamental model structure mismatch, not merely parameter errors. If the model structure were correct but parameters wrong, we would observe high correlation with systematic offset (bias).

5.4 Composite Gap Scores

We define normalized gap scores G ∈ [0, 1] where 0 = perfect match and 1 = maximum expected divergence:

Gkin = min(1, RMSEpos / εmaxpos), where εmaxpos = 0.5 rad
(6)
Gdyn = min(1, RMSEvel / εmaxvel), where εmaxvel = 2.0 rad/s
(7)
Gctrl = 1 - ρ̄, where ρ̄ = (1/n) Σj=1n ρj
(8)
Goverall = 0.4 · Gkin + 0.3 · Gdyn + 0.3 · Gctrl
(9)

The weighting (0.4, 0.3, 0.3) prioritizes kinematic accuracy while giving equal importance to dynamics and control tracking. These weights can be adjusted based on application requirements.

5.5 Spectral Analysis

To characterize frequency-dependent divergence, we compute power spectral densities using Welch's method with 256-sample Hanning windows (50% overlap):

Px(f) = (1/K) Σk=1K |Xk(f)|²
(10)

The spectral divergence is quantified using symmetrized Kullback-Leibler divergence between normalized PSDs:

DKL(Preal ∥ Psim) = Σf Preal(f) log[Preal(f) / Psim(f)]
(11)
Dspectral = ½[DKL(Preal ∥ Psim) + DKL(Psim ∥ Preal)]
(12)

5.6 Error Propagation Model

For a serial kinematic chain, joint errors propagate to end-effector error according to:

Δxee = J(q) · Δq
(13)

Where J(q) is the Jacobian matrix. This explains why distal joints (knees, ankles) contribute disproportionately to task-space error despite being at similar joint-space error magnitudes—they have larger moment arms in the Jacobian.

Proposition 2: Error Amplification in Bipeds
For bipedal locomotion, knee joint errors amplify to center-of-mass position errors by approximately 2× the link length. A 0.3 rad (17°) knee error with 0.4m shin length produces ~24cm CoM displacement—catastrophic for balance.
6

Results

6.1 Aggregate Gap Metrics

MetricValueUnits
Position MAE0.097rad (5.56°)
Position RMSE0.150rad (8.59°)
Velocity MAE0.408rad/s
Velocity RMSE0.649rad/s
Mean Correlation0.257-

Gap Score Breakdown

ComponentScoreInterpretation
Kinematic Gap0.300Moderate position tracking
Dynamic Gap0.065Excellent velocity alignment
Control Gap0.743Moderate trajectory correlation
Overall Gap0.362Good simulation fidelity

The low dynamic gap score (0.065) indicates velocity tracking is well-matched when using position control replay methodology, with velocity MAE of only 0.41 rad/s.

6.2 Trajectory Comparison

Figure 2 shows real vs. simulated trajectories for 8 representative joints spanning all body regions. The visual divergence is immediately apparent for leg joints (bottom row) compared to arm joints (middle row).

Real vs simulated joint trajectories for 8 representative joints
Figure 2: Trajectory comparison between real robot (blue) and MuJoCo simulation (orange) for 8 representative joints. Top row: Head and torso joints show reasonable agreement. Middle row: Arm joints track well with moderate offset. Bottom row: Leg joints (hip, knee, ankle) show substantial divergence with the simulation failing to capture the actual motion patterns.

6.3 Per-Joint Analysis

Joint-specific analysis reveals heterogeneity in simulation accuracy:

JointMAE (deg)RMSE (deg)Correlation
Right_Knee_Pitch12.213.70.36
Left_Knee_Pitch12.013.50.36
Right_Elbow_Yaw8.911.90.45
Left_Elbow_Yaw8.711.70.44
Left_Hip_Pitch8.610.50.33
Right_Hip_Pitch8.410.30.35
Left_Shoulder_Pitch7.09.30.38
Head_pitch3.85.20.52
Waist3.44.60.01
AAHead_yaw1.62.40.03

Key Observation

Knee joints show highest error (~12° MAE), consistent with load-bearing during locomotion. Upper body joints maintain moderate correlation (0.35-0.52) while leg joints show weaker correlation (0.1-0.36), indicating contact dynamics as the primary modeling challenge.

Position MAE by Joint (degrees)

R_Knee_Pitch
12.2°
L_Knee_Pitch
12.0°
R_Elbow_Yaw
8.9°
L_Hip_Pitch
8.6°
L_Shoulder
7.0°
Head_Pitch
3.8°
Waist
3.4°
Head_Yaw
1.6°
Per-joint error distribution
Figure 3: Position MAE distribution across all 23 joints. Left: Bar chart showing MAE in degrees for each joint, color-coded by body region (red=legs, blue=arms, green=head/torso). Right: Histogram of all joint errors showing the heavy right tail caused by leg joints.

6.4 Body Region Analysis

RegionMean MAE (deg)Mean Correlation# Joints
Legs6.70.1512
Arms5.50.408
Torso3.40.011
Head2.70.272

Statistical comparison: Legs vs. Arms effect size d = 0.41 (small-medium), p < 0.05. Legs show ~1.2× higher error than arms, with the difference driven primarily by knee joints.

Gap analysis by body region
Figure 4: Error decomposition by body region. Left: Box plots showing position MAE distribution for each body region—legs exhibit both higher median and greater variance. Right: Radar chart of gap score components showing the dominant contribution of dynamic gap (velocity tracking) across all regions.
Correlation heatmap between real and simulated trajectories
Figure 5: Correlation matrix between real and simulated joint trajectories across all 123 episodes. Green indicates positive correlation (simulation tracks reality), white indicates no correlation, and red indicates anti-correlation. The stark contrast between upper-body (moderate correlation) and lower-body (near-zero or negative correlation) is clearly visible.
7

Spectral Analysis

To understand frequency-dependent divergence patterns, we performed power spectral density (PSD) analysis on both real and simulated trajectories. This reveals which frequency components are well-modeled versus poorly captured.

7.1 Frequency Band Decomposition

We decompose the spectrum into three physiologically and mechanically meaningful bands:

Frequency BandRangeReal PowerSim PowerDivergence Pattern
Low frequency0-2 HzHigherLowerSimulation under-predicts slow deliberate motions
Mid frequency2-10 HzComparableComparableReasonable agreement for control bandwidth
High frequency>10 HzLowerLowerBoth systems show damping; sim slightly more
Proposition 3: Frequency-Dependent Gap Structure
The sim2real gap is not uniform across frequencies. Low-frequency divergence suggests unmodeled planning-level dynamics or systematic position offsets. High-frequency agreement indicates that both systems exhibit similar damping characteristics, though the simulator may be over-damped.

7.2 Spectral Divergence by Joint Category

Joint CategoryMean KL DivergenceDom. Freq (Real)Dom. Freq (Sim)Interpretation
Head0.472.4 Hz0.3 HzLarge frequency mismatch despite low position error
Arms0.051.8 Hz1.6 HzGood spectral agreement
Waist0.111.2 Hz1.0 HzModerate agreement
Legs0.092.1 Hz1.8 HzLower KL despite high position error

Key Insight

The head joints show high spectral divergence (KL = 0.47) despite excellent position tracking (0.5° MAE). This paradox reveals that position-based metrics alone are insufficient—the real head exhibits higher-frequency micro-movements (vestibular corrections, gaze stabilization) that the simulation smooths out.

Power spectral density comparison for leg joints
Figure 6: Power spectral density comparison for 6 leg joints. Blue traces show real robot PSDs; orange traces show MuJoCo simulation. Note the systematic under-prediction of low-frequency power and slight over-damping at high frequencies. The crossover point around 5 Hz suggests the actuator bandwidth may be correctly modeled but friction/damping parameters are off.

7.3 Implications for Model Improvement

The spectral analysis provides specific guidance for simulation improvement:

  • Low-frequency deficit: Add noise or perturbations to reference trajectories to capture planning-level variability
  • High-frequency over-damping: Reduce joint damping coefficients by 20-40%
  • Head frequency mismatch: Model vestibular feedback or add IMU-driven corrections
  • Contact-induced transients: The simulation misses high-frequency impact vibrations present in real foot strikes
8

Discussion

8.1 Sources of the Sim2Real Gap

Contact Modeling (Primary Source)

Largest errors occur in joints directly affecting foot-ground interaction (knee ~12°, hip ~8°). MuJoCo's contact model cannot capture shoe deformation, surface friction variability, or slip-stick transitions. Lower correlation for leg joints (r ≈ 0.15) versus arms (r ≈ 0.40) suggests contact dynamics remain the primary modeling challenge.

Actuator Dynamics (Well-Modeled)

Velocity tracking shows excellent agreement (0.41 rad/s MAE) with position control replay methodology, indicating that actuator dynamics are adequately captured by the PD controller model. The low dynamic gap score (0.065) confirms first-order dynamics are well-matched.

Structural Compliance

The moderate correlation for leg joints (0.15 mean) compared to arms (0.40 mean) may indicate unmodeled structural effects: link flexibility, joint play, and cable/wiring effects not captured in rigid-body simulation.

8.2 Implications for Sim2Real Transfer

Domain Randomization Ranges

ParameterRecommended RangeJustification
Leg joint positions±8°Covers observed error distribution
Arm joint positions±5°Upper-body errors are lower
Joint velocities±1 rad/sBased on 0.41 rad/s MAE
Ground friction0.3 - 0.8Contact is primary error source
Actuator strength±20%Dynamics well-matched
Joint damping1-3×Velocity tracking is good

Critical insight: Uniform randomization across all joints is suboptimal. Leg joints require higher randomization range than arms due to contact dynamics.

System Identification Priorities

  1. Ground contact parameters (friction, stiffness, damping)
  2. Knee and hip actuator dynamics (torque-speed curves, bandwidth)
  3. Leg structural compliance (joint stiffness, link flexibility)
  4. Arm actuator gains (lower priority given reasonable tracking)

8.3 Limitations

  • Task diversity: While 123 episodes represent diverse scenarios, they are not exhaustively labeled for task-specific analysis
  • Single robot: Results are specific to Booster T1; other humanoids may differ
  • Position control replay: While improved over kinematic replay, closed-loop policy execution would compound errors differently
  • No contact ground truth: Lack of force plates limits contact modeling analysis
  • MuJoCo-specific: Results may not generalize to other physics engines (IsaacGym, PyBullet, etc.)
9

Conclusion

9.1 Summary of Findings

  1. Overall position tracking error of 5.56° MAE demonstrates moderate simulation fidelity for humanoid robots
  2. Leg joints show ~1.2× higher error than arms (6.7° vs 5.5° MAE), with knee pitch reaching 12° error
  3. Velocity tracking is excellent (0.41 rad/s MAE) with position control replay methodology
  4. Correlation varies by body region: Arms show moderate agreement (r ≈ 0.40), legs show weak agreement (r ≈ 0.15)
  5. Contact modeling is the primary error source, evidenced by spatial error patterns in load-bearing joints
  6. Overall gap score of 0.36 indicates good simulation fidelity suitable for policy training with appropriate domain randomization

9.2 Practical Recommendations

For Practitioners

  • Use position control replay for gap characterization
  • Apply modest domain randomization (±8° legs, ±5° arms, ±1 rad/s velocity)
  • Consider joint-wise residual learning for knee/hip
  • Design controllers with ±10° margins for legs

Future Work

  • Closed-loop gap characterization
  • Task-stratified analysis
  • Cross-platform comparison
  • Learned residual dynamics
10

Appendix

A. Joint Specifications

JointRange (rad)Range (deg)Torque (Nm)
AAHead_yaw[-1.57, 1.57][-90, 90]7
Shoulder_Pitch[-3.31, 1.22][-190, 70]18
Elbow_Pitch[-2.27, 2.27][-130, 130]18
Waist[-1.57, 1.57][-90, 90]30
Hip_Pitch[-1.80, 1.57][-103, 90]45
Knee_Pitch[0, 2.34][0, 134]60
Ankle_Pitch[-0.87, 0.35][-50, 20]20

B. Dataset Statistics

MetricMeanStdMinMax
Duration (s)30.016.55.793.8
Samples2,3151,14712719,660
Position RMSE (deg)8.62.90.215.3
Correlation0.260.20-0.120.64

C. Reproducibility

research/sim2real/
├── scripts/
│   ├── data_loader.py      # Episode data loading
│   ├── booster_t1_sim.py   # MuJoCo simulation
│   ├── sim2real_analysis.py # Main analysis
│   └── generate_figures.py  # Figure generation
├── results/
│   ├── analysis_results.json
│   └── analysis_report.txt
└── figures/
    ├── fig1_trajectory_comparison.pdf
    ├── fig2_error_distribution.pdf
    └── ...

To reproduce: cd research/sim2real/scripts && python sim2real_analysis.py

D. Citation

@article{gerra2025sim2real,
  title={Empirical Characterization of the Simulation-to-Reality
         Gap in Full-Size Bipedal Humanoid Robots},
  author={gerra},
  year={2025},
  note={Available at https://gerra.com/research/sim2real}
}