Overview
Comprehensive egocentric video dataset captured from 5,000+ active users who are blind or have low vision, using Google Glass and Solos smart glasses. Each session includes synchronized audio, OCR text recognition outputs, conversation context transcripts, and head-mounted IMU motion data. Clips feature visible hands and objects in real-world scenarios, making this ideal for training assistive AI systems, accessibility applications, and computer vision models that understand first-person perspectives of visually impaired users.
Highlights
- Real-world POV from blind/low-vision users across diverse daily scenarios
- Rich metadata: OCR outputs with bounding boxes, conversation transcripts, and head IMU motion data
- Hands and objects clearly visible for manipulation understanding and object recognition
- Temporal alignment: All modalities synchronized with precise timestamps
- Multi-device coverage: Google Glass and Solos smart glasses for device-agnostic training
- Active collection pipeline: Can gather targeted scenarios from 5,000+ opt-in users
Deliverables
Files
MP4 egocentric POV (various resolutions by device), Extracted frames with hand/object visibility annotations, Synchronized audio tracks (WAV/MP3), Conversation transcripts with timestamps, JSON files with recognized text, bounding boxes, confidence scores, and timestamps, Head motion sensor data (JSON/CSV)
Labels
hand_visibility, object_detection, scene_classification, activity_labels, device_type