TELEOPERATION DATA

Robot Teleoperation

Success-labeled robot manipulation episodes collected via human teleoperation across diverse tasks and embodiments.

400K+ EPISODESH.264 · JSONL · NPZ35K+ new episodes per month
794
Accepted episodes
23-DOF
Named joints
915Hz
Top sensor rate
ORTF
Open format
01Sample

See the data

Representative records in the exact shape we deliver. Real provenance and full slices are shared under license.

Booster T1 episode, light-box pick-and-place (23-DOF)

Representative of the real metadata shape. Audio is 16 kHz wav; counts re-verified on disk (925 depth frames, 2,338 joint samples).

metadata.jsonrepresentative
{
  "episode": { "id": "episode_f0fe8050-...", "version": "2.0", "duration_seconds": 29.51,
    "task": { "name": "Light Box Table To Chair Behind V0",
              "instruction": "Pick up the light box, turn around, and place it on the chair behind you." } },
  "timing": { "start_unix": 1766526819.30, "fps_target": 30, "fps_actual": 40.06 },
  "sensors": {
    "visual": { "rgb_frames": 1182, "depth_frames": 925, "resolution": { "width": 1280, "height": 720 } },
    "proprioception": { "joint_samples": 2338, "joint_sample_rate_hz": 79.23 },
    "audio": { "format": "wav", "sample_rate": 16000, "channels": 1 } },
  "stats": { "recording_quality": "good", "active_streams_count": 5 },
  "provenance": { "watermarked": true, "signature": "TW7wMQxo...A8=" }
}

Per-step proprioception frame (23 named joints)

Representative. One timestep; positions in rad, velocities in rad/s, efforts in Nm.

joints.jsonlrepresentative
{
  "timestamp": 3.0976,
  "unix_timestamp": 1766526822.3984,
  "positions":  { "Left_Knee_Pitch": 0.4141, "Right_Knee_Pitch": 0.4229, "Waist": -0.0048, "Head_pitch": 0.3603 },
  "velocities": { "Left_Knee_Pitch": -0.00929, "Right_Knee_Pitch": 0.00759 },
  "efforts":    { "Left_Knee_Pitch": -4.2377, "Right_Knee_Pitch": -5.0484, "Left_Hip_Yaw": 2.9137 }
}

LimX Tron quadruped, fall-recovery, success-labeled

Representative, cross-embodiment. Shows the success label and per-stream rates including 915 Hz motors.

tron_metadata.jsonrepresentative
{
  "episode": { "id": "tron_success_001", "duration_seconds": 15.6 },
  "machine": { "type": "limx_tron1", "class": "quadruped_robot" },
  "task": { "type": "fall_recovery_v0", "success": true, "failure_reason": null },
  "sensors": { "feed": { "proprioception": {
    "motors":   { "sample_count": 14277, "avg_rate_hz": 915.2 },
    "imu":      { "sample_count": 1518,  "avg_rate_hz": 97.3 },
    "odometry": { "sample_count": 1501,  "avg_rate_hz": 96.2 } } } },
  "files": { "total_size_bytes": 64957399 }
}
02Schema

Record shape

Every field, its type, whether it can be null, and a representative value.

FieldTypeConstraintDescription
episode.idstringrequiredUnique episode identifier.
e.g. episode_f0fe8050-...
duration_secondsfloat64 · srequiredEpisode wall-clock duration.
e.g. 29.51
task.instructionstringrequiredLanguage-conditioned instruction given to the operator.
e.g. Pick up the light box and place it on the chair behind you.
task.successboolnullableHuman-verified task outcome (null if unevaluated).
e.g. true
task.failure_reasonstringnullableReason on failure; null on success.
e.g. null
timing.fps_targetint · fpsrequiredTarget video frame rate.
e.g. 30
timing.fps_actualfloat · fpsrequiredMeasured achieved frame rate.
e.g. 40.06
sensors.visual.rgb_framesint · framesrequiredRGB frame count (mp4, H.264).
e.g. 1182
sensors.visual.depth_framesint · framesrequiredDepth frame count (16-bit PNG).
e.g. 925
sensors.visual.resolution{w,h} · pxrequiredImage resolution.
e.g. 1280 x 720
sensors.proprioception.joint_samplesint · samplesrequiredJoint-state sample count.
e.g. 2338
sensors.proprioception.joint_sample_rate_hzfloat64 · HzrequiredMeasured joint sampling rate.
e.g. 79.23
sensors.audio.sample_rateint · HzrequiredAudio sample rate (wav, mono).
e.g. 16000
license.license_idstringrequiredPer-delivery license id.
e.g. GRR-20251223-A135CB89
provenance.signaturestringrequiredCryptographic integrity signature; episodes are watermarked.
e.g. TW7wMQxo...A8=
03What's included

Manipulation Trajectories

Complete teleoperated task demonstrations with synchronized vision and robot state across diverse manipulation tasks.

State-Action Pairs

Frame-aligned observation and action tuples ready for behavior cloning and inverse RL.

Success Labels

Human-verified task outcomes and metadata for filtering, curriculum design, and reward modeling.

04Methodology

How it is built

  1. 01

    Human teleoperation capture

    Episodes are recorded by human teleoperators driving real robots (Booster T1, LimX Tron, Unitree). Each episode is a single contiguous recording with a task name, description, and language instruction.

  2. 02

    Multi-modal synchronized recording

    RGB (mp4/H.264), depth (PNG), proprioception (joint position, velocity, effort), robot state, and audio (wav) are captured concurrently, each stamped with both an episode-relative timestamp and an absolute Unix timestamp at its native rate.

  3. 03

    Timestamp normalization

    All streams carry float64 Unix and episode-relative timestamps for cross-stream alignment. Native measurement rates are preserved rather than resampled, with a sync-tolerance budget recorded in the format.

  4. 04

    Quality and success labeling

    Each episode gets a recording-quality tag and an active-stream count, and tasks carry a boolean success plus failure reason. Accepted episodes are organized by outcome, and admin review gates customer visibility.

  5. 05

    Licensing and watermark

    On delivery each episode is stamped with a license block, a multi-layer watermark, a cryptographic signature, and per-file checksums.

  6. 06

    ORTF packaging

    Episodes are packaged in the Open Robot Training Format: a manifest, an episode index, chunked step records, videos, and optional annotations and robot description, convertible to LeRobot and RLDS / Open-X-Embodiment.

05Evals

How we validate

What each evaluation measures and how it is run. Where no benchmark is published, we show the methodology and say so.

Sim2Real Gap Characterization

Measures

How faithfully a MuJoCo simulation reproduces the real teleoperated trajectories - which validates that the captured real data is the ground-truth reference.

Method

Position-controlled trajectory replay in MuJoCo over 123 episodes, 3,687 seconds, and 284,794 joint samples, scored per joint with MAE, RMSE, and Pearson correlation plus a composite gap score.

Result

Real measured: aggregate position MAE 5.56 degrees, knee pitch highest at 12.2 degrees, velocity MAE 0.41 rad/s, legs 6.7 vs arms 5.5 degrees (about 1.2x), overall gap score 0.36. See the sim2real paper.

Episode acceptance / QA gate

Measures

Whether an episode is clean enough to enter the saleable catalog.

Method

Automated format, stream-presence, and quality checks plus human QA, with a recording-quality tag, an active-stream count, and success/fail bucketing.

Result

Methodology-stage. Pass/fail per episode; no published aggregate acceptance-rate metric.

ORTF schema validation

Measures

Whether a packaged dataset conforms to the format.

Method

A validator checks manifest schema, required files, parquet-to-manifest match, video frame-count integrity, timestamp monotonicity, and action/observation dimension match.

Result

Methodology-stage. Validator pass/fail, not a score.

06Graders

Ground truth

What correct means for this data, and how it is established.

Ground truth

The human teleoperator executed trajectory is the demonstration ground truth, and the human-applied success boolean and failure reason are the task-outcome ground truth. For simulation fidelity, the real robot trajectory is the reference the simulation is scored against.

How it is established

Success labels are applied by humans and encoded per episode. Sim2real validation is an execution-based comparison of simulated vs real joint trajectories via per-joint MAE, RMSE, and correlation, with a normalized composite gap score. Integrity is graded by per-file SHA-256 checksums and a signature.

Agreement

No inter-rater agreement statistic is published. Correctness for the simulation study is anchored to the recorded real trajectory itself.

07Application

Imitation Learning

Behavior cloning from success-labeled human demonstrations on real hardware, with full state-action coverage.

VLA Pretraining

Vision-language-action triplets across tasks and embodiments for generalist manipulation policy pretraining.

RL Fine-Tuning

Seed real-world trajectories and success signals for offline RL and policy refinement before deployment.

08Environment & integration

How you load it

Delivery

S3 / pre-signed URL, REST API, CDN

Formats

ORTF (manifest + parquet + MP4 + PNG depth + WAV), LeRobot v3.0, RLDS / Open-X-Embodiment

Auth

Org-scoped API keys, tenant isolation, per-customer storage paths, and mutual TLS. Artifacts are signed, watermarked, and access-logged under a restricted per-delivery license.

Cadence

The back catalog is a one-time licensable archive of 794 accepted episodes, roughly 6.7 hours and 63 GiB; ongoing direct-collection programs are supported.

quickstart.sh
# Load and convert an ORTF episode set
GET /datasets # list available datasets
GET /datasets/{id} # metadata + episode index
GET /datasets/{id}/download # pre-signed URL to the ORTF bundle
 
ortf validate ./dataset # manifest + frame-count + timestamp checks
to_lerobot(load_dataset("./ortf"), "out/lerobot") # convert to LeRobot v3.0

Request access.

Restricted-scope evaluation access for qualified teams. We share real samples, full schema, and provenance under a mutual NDA.