Metrics

UPH – units per hour. How fast the system works.

MTBA – mean time between assists. How long it runs before a human needs to step in.

Tasks & hardware

The first task is bin-to-bin order picking – transferring individual items between containers. Evaluations run on Franka Research 3 arms with Robotiq grippers. More tasks and platforms are coming.

Fine-tuning dataset

The DROID teleoperation dataset used to fine-tune all models on the leaderboard. 352 episodes, 12GB. Available for non-commercial use.

uv run --with positronic \
  python -m positronic.cfg.ds.phail

Evaluation runs

Every evaluation run on the leaderboard is a downloadable Positronic dataset – multi-view video and robot telemetry.

uv run --with positronic \
  positronic-server \
  [email protected]_runs

Browse individual runs in the Run explorer.

Methodology

Full evaluation protocol, scoring, and reproducibility details are in the white paper.

Contact

Questions, feedback, or interested in submitting a model? Reach out at [email protected].