A novel public dataset for multimodal multiview and multispectral driver distraction analysis: 3MDAD

https://doi.org/10.1016/j.image.2020.115960Get rights and content

Highlights

  • Introducing a new extensive, multimodal, multiview and multispectral dataset.

  • Highlighting the issues observed in driving environments in day and night time.

  • Exploring public databases and providing an overview of some private datasets.

  • Assessing the relevance of using two cameras simultaneously in multiple viewpoints.

Abstract

Driver distraction and fatigue have become one of the leading causes of severe traffic accidents. Hence, driver inattention monitoring systems are crucial. Even with the growing development of advanced driver assistance systems and the introduction of third-level autonomous vehicles, this task is still trending and complex due to challenges such as the illumination change and the dynamic background. To reliably compare and validate driver inattention monitoring methods, a limited number of public datasets are available. In this paper, we put forward a public, well-structured and complete dataset, named Multiview, Multimodal and Multispectral Driver Action Dataset (3MDAD). The dataset is mainly composed of two sets: the first one recorded in daytime and the second one at nighttime. Each set consists of two synchronized data modalities, both from frontal and side views. More than 60 drivers are asked to execute 16 in-vehicle actions under a wide range of naturalistic driving settings. In contrast to other public datasets, 3MDAD presents multiple modalities, spectrums and views under different time and weather conditions. To highlight the utility of our dataset, we independently analyze the driver action recognition results adapted to each modality and those obtained of several combinations of modalities.

Introduction

Intelligent Transportation Systems (ITS) are becoming an important component of our society. They are meant to enhance transportation safety, efficiency and sustainability as well as comfortable driving experience [1], [2]. One of the main key focus of ITS is the Advanced Driver Assistance System (ADAS) technology, which plays a crucial role in ensuring the vehicle, driver, pedestrian and passenger’s safety and comfort. Properly used ADAS technologies can prevent 40% of all vehicle crashes and about 30% of traffic deaths [3]. In the USA, vehicular collisions, in 2016, caused 37,461 fatalities and more than 2.4 million demoralizing injuries, with an estimated cost of 242 billion dollars. Importantly, more than 37,400 people were killed in traffic crashes (a 5% increase from 2015) [4], [5]. A high amount of fatalities occurred in darkness or twilight when it was very difficult for drivers to see clearly.

Driver inattention is an extremely influential contributing factor of road crashes and incidents. It is defined as a diminished attention to activities that are critical for safe driving in the absence of a competing activity [6]. This factor can be clustered into two main classes: distraction on the one hand, and fatigue and somnolence on the other hand [7]. This was confirmed by an online survey, in which Peter et al. [4] affirmed that the two main leading contributors to severe crashes were distraction by secondary tasks and poor visibility in low light that generally made drivers sleepy.

With the continuous improvement and development of advanced driver assistance systems up to very automated driving functions, drivers are allowed to engage temporarily in non-driving related tasks. However, they need to react appropriately to a taking control request when the automated vehicle reaches its limitations. To support the driver in such situations, driver inattention monitoring systems might permit adaptive take-over concepts [8]. Thus, monitoring driver inattention is still important and is a trending topic that faces several challenges [9] such as illumination variations, cluttered background, etc. Non driving related tasks are numerous and can take many forms. Thus, the National Highway Traffic Safety Administration categorized distraction into four groups [10]: cognitive distraction when the visual field of the driver is blocked where they have to be looking while driving, visual distraction when the driver neglects looking at areas they should be looking to while driving, physical distraction when both driver’s hands (or one hand) are taken off the steering wheel to manipulate an object, and auditory distraction when the driver is prevented by sounds from making the best use of their hearing since their attention is drawn to whatever causes the sound. Most of non driving activities can include more than one of these classes, for example talking to a cell phone which creates cognitive, physical and auditory distraction.

Crash data affirm that poor visibility at night is one of the leading contributors to fatal collisions. In fact, for decades, nighttime fatality rates have been three to four times higher than daytime rates [4]. Multiple factors are involved in the road safety difference between day and night. The main factors are poor visibility coupled with distracted driving and drivers’ fatigue that is a natural reaction to darkness. Fatigue does not have a universal definition [11]. In an attempt to avoid accidents, most fatigued and sleepy drivers will try to avoid sleeping. Thus, certain physical and physiological phenomena that precede the onset of sleep can be observed. When a driver is tired and begins to resist getting sleepy, several symptoms can be noticed such as the increased frequency of touching eyes, heads and faces, repeated yawning, the difficulty of keeping eyes open, slower responses and reactions, etc. Consequently, in our study, we consider fatigue and somnolence as a secondary task activity.

To maintain safe driving, monitoring driver inattention is vital and crucial. As a result, several efforts to detect and recognize driver distraction are made utilizing different acquisition devices. To be effective, such sensors need to be human centric and take into account a lot of system components including: driver monitoring (e.g. looking at the driver to recognize their activity and attention state), vehicle sensors (e.g. looking at the vehicle speed, the steering angle, braking, etc.) and vehicle surroundings (e.g. looking at road and other cars to understand the surrounding situation) [12], [13]. In this paper, we focus mainly on driver monitoring given that it offers a deeper understanding about recognizing common inattention.

Different physiological and physical signals have been captured such as ECG, EEG, EOG, etc [14], [15], [16], [17], [18], [19]. However, their high cost and the complexity of their installation have led to the use of vision sensors. These latter offer the most direct method for detecting the early onset of distraction [7], and there are excellent means of optimization since such sensors can be seen as an excellent platform to be shared with other vision-based driver assistance applications. Multiple types of vision sensors can be used including monocular cameras, depth cameras, etc. Given their clarity, color images prove their robustness, especially in controlled environments. However, in naturalistic driving settings, captured color images are impacted by various weather conditions, which influence the image quality [20], [21]. For this reason, researchers tend to supplement or even replace images provided by monochrome and color cameras in the visible spectrum with practical images from other modalities with the intent of improving the performance of the whole system no matter what the weather conditions are like, but still keeping or even ameliorating the types of features and classifiers. At the same time, infrared cameras can produce clear images under the same conditions (day or night), and they are designed to be used in poor lighting and poor weather conditions.

In real-world driving settings, the accuracy of vision-based driver inattention monitoring methods remain limited because of the big number of challenges present in this domain such as dynamic background, occlusion, and bad visibility [22]. To effectively compare these methods, public datasets are crucial. They allow the direct comparison of numerous methods with the state of the art and they open challenging questions to a wider community. Therefore, several datasets have been collected, whether they are in simulated assisted-driving, naturalistic driving settings or a parked vehicle. However, most of them are not publicly available.

In order to facilitate research activities in this field, we propose in this paper a complete, useful and publicly available dataset, named Multiview, Multimodal and Multispectral Driver Action Dataset (3MDAD), which is designed to overcome the limitations of the aforementioned databases. Our new public, multispectral, multimodal and extensive dataset highlights the issues observed in naturalistic driving settings including multiple users, dynamic and cluttered background, varying viewpoints and lighting conditions employing a Kinect camera during both daytime and nighttime. 3MDAD presents an important number of distracted actions reported by the WHO [23]. In daytime, it provides temporally synchronized RGB frames and depth frames. At nighttime, 3MDAD contains temporally synchronized infrared frames and depth frames. Such a dataset is of a valuable benefit to researchers working in different fields like image processing, computer vision, sensors fusion [24], [25], and human-centered intelligent driver assistance systems.

The main contributions of this paper include:

  • (1)

    Introducing a new extensive, multimodal, multiview and multispectral dataset that highlights the issues observed in naturalistic driving environments, employing two Kinect depth cameras in daytime and at nighttime. The dataset is publicly available.1

  • (2)

    Exploring public databases to our knowledge while positioning ourselves against these datasets, and providing an overview of some selected private datasets

  • (3)

    Assessing the relevance of using two cameras simultaneously in multiple viewpoints where each of them delivers different modality by applying early fusion at both day and night times

The remaining of this paper is organized as follows: In Section 2, the main publicly available datasets and some private datasets related to ours are briefly reviewed. Our new public dataset is described and the main differences with the existing public dataset are pointed out in Section 3. The main naturalistic driving setting challenges are described in Section 4. Section 5 demonstrates the utility of our dataset in recognizing drivers’ in-vehicle actions by reporting several experiments

based on the features extracted from Spatio Temporal Interest Points (STIPs) and automatically extracted features based on deep learning. The conclusion is finally stated in Section 6.

Section snippets

Related work

Monitoring driver inattention is one of the most active research areas in both machine learning and computer vision. There are several ways to monitor driver inattention depending on specific purposes. Fig. 1 shows a general overview of the basic components of a common vision-based monitoring driver system. We notice that such systems can use several types of vision sensors, extract different kinds of features that will be used separately or merged, and try to detect and recognize driver

3MDAD dataset

Motivated by the need for public datasets and the limited number of existing ones, we introduce 3MDAD which addresses multiple aforementioned shortages of the state of the art datasets. This includes the multimodal synchronized data, the diversity of performed in-vehicle actions, and the variety of drivers. The introduced dataset opens challenging questions of real world driving settings to a wider community. Table 2 shows a summary of the existing public datasets compared to 3MDAD.

Dataset design criteria

To be complete, the proposed 3MDAD should contain frame sequences to assess the main issues related to monitoring driver distraction. In this section, we describe the challenges related to naturalistic-driving settings that we strive to represent in our dataset.

Driver action recognition experiments

To highlight the utility of our dataset, we choose to perform driver action recognition experiments based on handcrafted features and automatically extracted features.

Conclusion

The increase in the vehicle infotainment prevalence and busy life drastically decreases driver attention, which makes distraction one of the leading causes of severe vehicle crashes, hence causing death. To validate driver monitoring inattention approaches, a new public, complete, well structured dataset is introduced. The novel dataset contains two sets of naturalistic driving frame sequences, during daytime and nighttime. It is composed of a high variety of frame sequences recorded on

CRediT authorship contribution statement

Imen Jegham: Conceptualization, Data curation, Writing - original draft, Software. Anouar Ben Khalifa: Methodology Visualization, Supervision, Investigation, Software, Validation. Ihsen Alouani: Conceptualization, Methodology, Writing - review & editing Validation. Mohamed Ali Mahjoub: Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We thank the drivers who have agreed to participate in the collection of the dataset for their support and encouragement.

References (63)

  • McDonaldA. et al.

    Vehicle owners’ experiences with and reactions to advanced driver assistance systems

    (2018)
  • MimounaA. et al.

    OLIMP: A heterogeneous multimodal dataset for advanced environment perception

    Electronics

    (2020)
  • LeeJ.D. et al.

    Defining driver distraction

    Driver Distract. Theory Effects Mitigat.

    (2008)
  • DongY. et al.

    Driver inattention monitoring system for intelligent vehicles: A review

  • PechT. et al.

    Real time recognition of non-driving related tasks in the context of highly automated driving

  • JeghamI. et al.

    Safe driving : Driver action recognition using SURF keypoints

  • RanneyT.A. et al.

    NHTSA Driver Distraction Research: Past, Present, and FutureTechnical Report

    (2001)
  • CardosoM. et al.

    A pre/post evaluation of fatigue, stress and vigilance amongst commercially licensed truck drivers performing a prolonged driving task

    Int. J. Occup. Safety Ergon.

    (2018)
  • BillahT. et al.

    Recognizing distractions for assistive driving by tracking body parts

    IEEE Trans. Circuits Syst. Video Technol.

    (2019)
  • TranC. et al.

    Driver assistance for “keeping hands on the wheel and eyes on the road”

  • Reyes-MuñozA. et al.

    Integration of body sensor networks and vehicular ad-hoc networks for traffic safety

    Sensors

    (2016)
  • AmeurS. et al.

    A comprehensive leap motion database for hand gesture recognition

  • S. Jafarnejad, G. Castignani, T. Engel, Non-intrusive distracted driving detection based on driving sensing data, in:...
  • MimounaA. et al.

    Human action recognition using triaxial accelerometer data: Selective approach

  • JeghamI. et al.

    Pedestrian detection in poor weather conditions using moving camera

  • ChebliK. et al.

    Pedestrian detection based on background compensation with block-matching algorithm

  • WHOI.

    Distracted driving

    (2018)
  • LejmiW. et al.

    Fusion strategies for recognition of violence actions

  • GaoZ. et al.

    Adaptive fusion and category-level dictionary learning model for multiview human action recognition

    IEEE Internet Things J.

    (2019)
  • Ohn-BarE. et al.

    Driver hand activity analysis in naturalistic driving studies: challenges, algorithms, and experimental studies

    J. Electron. Imaging

    (2013)
  • CrayeC. et al.

    Driver distraction detection and recognition using RGB-d sensor

    (2015)
  • Cited by (50)

    • Driver distraction detection using semi-supervised lightweight vision transformer

      2024, Engineering Applications of Artificial Intelligence
    • Analysis of mobile phone use engagement during naturalistic driving through explainable imbalanced machine learning

      2023, Accident Analysis and Prevention
      Citation Excerpt :

      In addition, the researchers developed a CNN-based binary classifier to distinguish the handheld and the hands-free phone use. Some other studies have combined mobile sensors with other modalities for distracted driving detection for better accuracy results (Du et al., 2018; Jegham et al., 2020; Das et al., 2022). In Du et al. (2018), the researchers introduced multimodal polynomial fusion for distracted driving detection utilizing features from three modalities, namely facial expression, speech and car signals.

    • A Survey on Datasets for the Decision Making of Autonomous Vehicles

      2024, IEEE Intelligent Transportation Systems Magazine
    View all citing articles on Scopus
    View full text