The increasing adoption of human-robot interaction presents opportunities for technology to positively impact lives, particularly those with visual impairments, through applications such as guide-dog-like assistive robotics. To fill a gap in the current landscape of publicly available datasets and provide a means to assist in the development of safer and more robust algorithms in the future, this website hosts and outlines the dataset introduced in our IEEE CASE 2024 paper named: Towards Robust Perception for Assistive Robotics: An RGB-Event-LiDAR Dataset and Multi-Modal Detection Pipeline.
The dataset introduced, named 'REveL', contains RGB, event, point cloud and Inertial Measurement Unit (IMU) data along with ground truth poses of persons in the scene. It is 14.1 minutes in length split over four ROSBags. To complement existing datasets and aid in enhancing detection model generalisation to different scenes, it was collected in an indoor scenario with a handheld sensor suite moving in the field of view of a Vicon motion-capture system. Two people, also tracked by the motion-capture system, are moving in and out of the sensor suite field of view.
To collect the RGB, event, and point cloud data, the sensor suite utilised consisted of:
The sensor’s measurements and output of the motion-capture system were recorded with ROS. We use the rpg dvs ros driver for the DVS camera and the Blickfeld ROS driver for the LiDAR. The sensor suite and helmets worn by persons in the scene were equipped with a set of reflective markers tracked by the Vicon system, subsequently providing the 6-DoF pose of the 2 persons and sensor suite in an arbitrarily fixed reference frame. For convenience and utility, the RGB portion of the dataset is labelled with the class identifier corresponding to the colour helmet worn by each person. In total, the data collected consisted of approximately:
We compared the reliability of pedestrian detection in the Event space with detection in the RGB space using YOLOv4.
Event space
RGB space
Using the Vicon motion capture system, the poses of two persons in the scene are obtained to be used as a groundtruth for localisation. The below video shows a played back section of 'dynamic.bag' in RViz with the acquired poses overlaid on the LiDAR pointcloud data. Note that when the Vicon system can not accurately track the location of a person, the pose defaults to the origin.
@misc{scicluna2024robustperceptionassistiverobotics,
title={Towards Robust Perception for Assistive Robotics: An RGB-Event-LiDAR Dataset and Multi-Modal Detection Pipeline},
author={Adam Scicluna and Cedric Le Gentil and Sheila Sutjipto and Gavin Paul},
year={2024},
eprint={2408.13394},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2408.13394},
}