Real, as easily as sim. Run your model on a real arm through the same API as sim – one set of metrics, one comparable number.

One catalog, growing. Real rigs plus the sim benchmarks you already run (LIBERO, RoboCasa, ManiSkill) in the same harness – so you stop rebuilding eval infrastructure for every model and robot.

Numbers you can cite. Blinded, randomized, enough trials to mean something – every run saved with multi-view video and telemetry.

Private or public. Keep results to your lab, or put them on the leaderboard alongside OpenPI π0.5, GR00T, SmolVLA, and ACT.

We’re opening this to early users. Leave your email and what you’d run through it.

See it live: the v1.0 leaderboard, every run, the protocol and data, and the methodology.