RGB
15 Hz720p or 1080p JPEG, the master clock every stream syncs to.
Record synchronized RGB, LiDAR depth, 6-DoF pose and IMU into one open MCAP file. VLMhub turns every recording into a queryable, training-ready dataset on its own.
synced on header.stamp · depth ±0.1s · imu ±0.05s
A proprietary on-device pipeline masters the MCAP contract, so the capture is compatible with the open stera-sdk with no cloud dependency. Every stream carries a correct header timestamp.
720p or 1080p JPEG, the master clock every stream syncs to.
256 x 192, 16-bit millimetre depth, registered to the RGB frame.
ARKit camera-in-world transform, captured every tracked frame.
Linear acceleration, angular velocity and orientation.
The app writes one streaming recording.mcap on device, crash-safe against thermal-throttle frame drops.
VLMhub catalogs each recording in a single atomic transaction, so no failure leaves the catalog inconsistent.
Hand-pose tracking is dispatched as a background worker, decoupled from ingest throughput.
Blur and tracking-confidence scoring decide accept, reject or retry, with no human in the loop.
Timestamp-indexed seeking produces PyTorch-ready episodes for vision-language-action training.
VLMhub treats every new recording as a job to be driven to completion on its own. The result is a data-flywheel for embodied agents: zero human touch from raw capture to a queryable, training-ready dataset.
Tell us about your capture program or private-model project. We will walk you through a self-hosted deployment end to end.