ActionSense: A multimodal dataset and recording framework

Synchronize wearable sensors with videos and ground-truth labels to train more dexterous robots or create smarter textiles

main project website | poster | video: conference presentation | publications

Collaborators: Chao Liu, Yiyue Luo, Michael Foshey,
Yunzhu Li, Antonio Torralba, Wojciech Matusik, and Daniela Rus

Photos and poster by Joseph DelPreto, MIT CSAIL

Conference Presentation: Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks

Publications

  • J. DelPreto, C. Liu, Y. Luo, M. Foshey, Y. Li, A. Torralba, W. Matusik, and D. Rus, “ActionSense: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment,” in Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks, 2022.
    [BibTeX] [Abstract] [Download PDF]

    This paper introduces ActionSense, a multimodal dataset and recording framework with an emphasis on wearable sensing in a kitchen environment. It provides rich, synchronized data streams along with ground truth data to facilitate learning pipelines that could extract insights about how humans interact with the physical world during activities of daily living, and help lead to more capable and collaborative robot assistants. The wearable sensing suite captures motion, force, and attention information; it includes eye tracking with a first-person camera, forearm muscle activity sensors, a body-tracking system using 17 inertial sensors, finger-tracking gloves, and custom tactile sensors on the hands that use a matrix of conductive threads. This is coupled with activity labels and with externally-captured data from multiple RGB cameras, a depth camera, and microphones. The specific tasks recorded in ActionSense are designed to highlight lower-level physical skills and higher-level scene reasoning or action planning. They include simple object manipulations (e.g., stacking plates), dexterous actions (e.g., peeling or cutting vegetables), and complex action sequences (e.g., setting a table or loading a dishwasher). The resulting dataset and underlying experiment framework are available at https://action-sense.csail.mit.edu. Preliminary networks and analyses explore modality subsets and cross-modal correlations. ActionSense aims to support applications including learning from demonstrations, dexterous robot control, cross-modal predictions, and fine-grained action segmentation. It could also help inform the next generation of smart textiles that may one day unobtrusively send rich data streams to in-home collaborative or autonomous robot assistants.

    @inproceedings{delpretoLiu2022actionSense,
    title={{ActionSense}: A Multimodal Dataset and Recording Framework for Human Activities Using Wearable Sensors in a Kitchen Environment},
    author={Joseph DelPreto and Chao Liu and Yiyue Luo and Michael Foshey and Yunzhu Li and Antonio Torralba and Wojciech Matusik and Daniela Rus},
    booktitle={Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
    year={2022},
    url={https://action-sense.csail.mit.edu},
    abstract={This paper introduces ActionSense, a multimodal dataset and recording framework with an emphasis on wearable sensing in a kitchen environment. It provides rich, synchronized data streams along with ground truth data to facilitate learning pipelines that could extract insights about how humans interact with the physical world during activities of daily living, and help lead to more capable and collaborative robot assistants. The wearable sensing suite captures motion, force, and attention information; it includes eye tracking with a first-person camera, forearm muscle activity sensors, a body-tracking system using 17 inertial sensors, finger-tracking gloves, and custom tactile sensors on the hands that use a matrix of conductive threads. This is coupled with activity labels and with externally-captured data from multiple RGB cameras, a depth camera, and microphones. The specific tasks recorded in ActionSense are designed to highlight lower-level physical skills and higher-level scene reasoning or action planning. They include simple object manipulations (e.g., stacking plates), dexterous actions (e.g., peeling or cutting vegetables), and complex action sequences (e.g., setting a table or loading a dishwasher). The resulting dataset and underlying experiment framework are available at https://action-sense.csail.mit.edu. Preliminary networks and analyses explore modality subsets and cross-modal correlations. ActionSense aims to support applications including learning from demonstrations, dexterous robot control, cross-modal predictions, and fine-grained action segmentation. It could also help inform the next generation of smart textiles that may one day unobtrusively send rich data streams to in-home collaborative or autonomous robot assistants.}
    }

Leave a Reply