Elsevier

Pattern Recognition

Volume 40, Issue 8, August 2007, Pages 2341-2355
Pattern Recognition

A visual approach for driver inattention detection

https://doi.org/10.1016/j.patcog.2007.01.018Get rights and content

Abstract

Monitoring driver fatigue, inattention, and lack of sleep is very important in preventing motor vehicles accidents. A visual system for automatic driver vigilance has to address two fundamental problems. First of all, it has to analyze the sequence of images and detect if the driver has his eyes open or closed, and then it has to evaluate the temporal occurrence of eyes open to estimate the driver's visual attention level. In this paper we propose a visual approach that solves both problems. A neural classifier is applied to recognize the eyes in the image, selecting two candidate regions that might contain the eyes by using iris geometrical information and symmetry. The novelty of this work is that the algorithm works on complex images without constraints on the background, skin color segmentation and so on. Several experiments were carried out on images of subjects with different eye colors, some of them wearing glasses, in different light conditions. Tests show robustness with respect to situations such as eyes partially occluded, head rotation and so on. In particular, when applied to images where people have eyes closed the proposed algorithm correctly reveals the absence of eyes. Next, the analysis of the eye occurrence in image sequences is carried out with a probabilistic model to recognize anomalous behaviors such as driver inattention or sleepiness. Image sequences acquired in the laboratory and while people were driving a car were used to test the driver behavior analysis and demonstrate the effectiveness of the whole approach.

Introduction

The detection of driver visual attention is very important for developing automatic systems that monitor driver inattention, driver fatigue, and sleepiness. A great number of fatalities occurring in motor vehicles could be avoided if these behaviors were detected and alarm signals were provided to the driver. The literature reports many attempts to develop safety systems for reducing the number of automobile accidents: these systems detect both the “driving” behavior by monitoring lane keeping, steering movements, acceleration, braking and gear changing [1], and also the “driver” behavior by such means as tracking the driver's head and the eye movements, monitoring the heart and breathing rates, the brain activity [2], and recognizing the torso and arm/leg motion [3]. Repeated experiments have shown that among all driver performance and bio-behavioral measures tested, the percentage of eyelid closure over time (Perclos) reliably predicts the most widely recognized psycho-physiological index of loss of alertness. In this work we address the problem, crucial for automotive applications, of developing a robust eye recognition algorithm that can be used for detecting the Perclos measure but also for modelling different driver behaviors.

Many works presented in the literature propose real time methods for eye tracking based on active infrared (IR) illumination approaches. Some commercial products are coming to the market such as SmartEye and EyeAlert Fatigue Warning System [4], [7] and also many Institutions are actively involved in automotive research projects. Different types of IR light sources have been devised to emit non-coherent energy synchronized with the camera frame rate, which generates bright and dark pupil images. Pupils can be detected by a simple thresholding of the difference between the dark and the bright pupil images [9], [10], [11], [12], [13], [14], [15]. The success of these active approaches depends on several factors: the brightness and size of the pupils, which are often functions of face orientation; external illumination interference; the distance of the subject from the camera; the need for stable lighting conditions (not strong sun light). In addition, glasses tend to disturb the IR light so much that the red eye effect may appear very weak. In recent years large improvements have been made in miniaturizing the cameras with the compact IR illuminator, in designing configurations that produce diffuse lighting, and in selecting allowable levels of IR irradiation. However, in many cases an initial calibration phase is still required during which the intensity of the active IR illuminators has to be tuned in order to operate in different natural light conditions, multiple reflections of glasses, and variable gaze directions.

Alternative approaches that use standard cameras with classical algorithms for eye detection in cluttered images have also been investigated [8]. SeeingMachine [5] proposes a couple of stereo cameras to determine the 3D position of matching features on the driver face. Starting from an initial calibration of the subject in a green room, the extracted 3D features are then used to capture the 3D pose of the person's face as well as the eye gaze direction, blink rates and eye closure [6]. Instead, in our work we investigate the subject of eye detection with monocular images within the visual spectrum and normal illumination. Generally the eye detection algorithms in cluttered images require two steps: locating face to extract eye regions and then eye detection from eye windows. The face detection problem has been addressed with different approaches: neural networks, principal components, independent components, skin color based methods, face models [16], [17], [18]. Each imposes some constraints: frontal view, expressionless images, limited variations of light conditions, hairstyle dependence, uniform background, and so on. An exhaustive review has been presented in Ref. [19].

Many works on eye or iris detections, assume either that eye windows have been extracted or rough face regions have already been located [20], [21], [22], [24], [25], [26], [27]. In Ref. [21], the eye detection method is performed within the possible eye region of the candidate face field. In this case it can be applied only after a face detection system has extracted a small number of candidates for eye regions. Left and right eye templates were used to detect eyes by a method that is unaffected by slight rotation, scaling, and translation (up to 10%). The algorithm proposed in Ref. [20] requires the detection of face regions in order to extract intensity valleys. Only at this point do the authors apply a template matching technique to extract iris candidates. The authors deal with the difficult problem of face region extraction both in intensity images and in color images.

Skin color models are strictly related to the considered images and cannot be so general as to be applied in every light condition and with different colors of skin. Region-growing methods or head contour methods on intensity images require strong constraints such as plain background. In Refs. [22], [24] the first step is also a face detection algorithm, based on skin color segmentation of the input image with the constraints of there being only one face and a simple background; then the facial feature segmentation is based on gray value reliefs in Ref. [22] or on template matching of edge and color features in Ref. [24]. A probabilistic framework is used in Ref. [23] for locating precisely eyes in face areas extracted formerly by using a face detector. In Ref. [25] linear and non-linear filters have been used for eye detection: oriented Gabor wavelets form an approximation of the eye in gray level images; non-linear filters are applied to color images to determine the color distribution of the sclera region. In both cases a face detection step is applied, which assumes the face as the most prominent flash tone region in the image. The same algorithm has been used in Ref. [26] for tracking iris and eyelids in video.

In Ref. [27], lip and skin color predicates are used as a first step to segment lip region and skin regions in the image: the two holes above the lip region that satisfy some fixed size criteria are selected as the candidate for the eyes region. A hierarchical strategy is applied to track the eyes in a video sequence and evaluate the driver visual attention by finite state automata. In Ref. [28] the authors make use of multicues extracted from a gray level image to detect the eye windows within an a priori detected face region. The precise iris and eye corner locations are then detected by a variance projection function and an eye variance filter. In Ref. [29], face regions are also initially determined by using rules derived from quadratic polynomial models, then eye components are extracted after the segmentation of skin pixels and lips. In Ref. [30] the distribution of discriminant features that characterize eye patterns are learnt statistically and provided to probabilistic classifiers to separate eyes and non-eyes. Also in this case the eye localization method requires a face detection step based on hierarchical classifiers [31].

The use of eye detection algorithms in the visible spectrum for automotive applications is not straightforward for several reasons: the problem of face segmentation (distinguishing faces from a cluttered background) should not be avoided, as has been done in many papers by imaging faces against a uniform background; the common use of skin color information to segment the face region is based on computationally expensive initializations and is not so general as to be applicable in different light conditions and with different skin colors; finally, the more precise the location of the eye regions in an initial step, the more reliable the results of the subsequent eye detection algorithms.

The second problem that a visual system for automotive applications has to solve is to model the eye occurrence on image sequences to detect driver status. Substantial amount of data have been collected to study driver behavior from a suite of vehicle sensors and unobtrusively placed video cameras [32]. Human error is known to be a causal factor in many accidents. Inattention and fatigue play an important role in human errors since they affect cognitive aspects and impair perception and the ability to make a decision to react, and also degrade the actual performance of actions. Detecting the driver's state and the driver's fatigue level in particular is actually a difficult task. Research on this subject is still incomplete. At this time it is not possible to provide an exact quantitative assessment of fatigue. There are also limits to the effectiveness of attention monitoring systems. Monitoring the gaze position or head orientation is one plausible method, but looking at something does not necessarily mean being aware of it [33]. These kinds of analysis require cognitive and physiology studies that are beyond the scope of this paper. However, recent study results on driver accidents demonstrate that in 93% of all rear-end crashes, the driver looked away from the road ahead at least once within 5 s of the crash. What this suggest is that many crashes occur because drivers do not anticipate the events or are unable to respond to these unanticipated events in a timely manner [34]. For this reason, redirecting the driver's eyes to the road ahead may be one way to decrease crash rates. In Ref. [27] a simple characterization of the driver behavior has been introduced by using three different finite state machines monitoring the eye closure rates and long-lasting head rotations to distinguish good visual attention, decreasing visual attention, and low visual attention. The finite state machines have to be initially defined and should be valid for any driver. The Perclos measure (the percentage of time a driver's eyes are closed) has been recognized as a valid measure of loss of alertness among drivers. Many commercial and experimental sensors [4], [5], [14] currently use the Perclos measure to evaluate driver fatigue and many research efforts are centered on the effects of warning systems on driver performance [35]. The Perclos measure is evaluated during a long observation period in order to assess if a critical level of fatigue has been reached: the higher the Perclos measure, the higher the driver fatigue level. The warning triggers are associated with Perclos calculated over at least 1 min.

The main objective of our work is to introduce a new visual approach to detect the eye occurrence in image sequences in the visible spectrum, and to learn the model of normal behavior of each driver during an initial training phase. We propose an eye detection algorithm, applicable in the real context of people driving a car, that does not assume predefined acquisition conditions on the background and skips the first segmentation step that extracts the face region, as is commonly described in the literature. The proposed approach works on the whole image, looking for regions that have the same geometrical configuration at the edges as those expected of the iris. Different iris radii are considered to take account of people having different eye dimensions; also, light variation is addressed to take account of the distance between the camera and the person. The search for similar regions is used to discard false positives that can occur in the image. Experimental results demonstrate that when the eyes are open they are correctly detected. However, when the eyes are closed or occluded, the algorithm provides false positives. To overcome this problem, we introduce a further step for the validation of the results: a neural classifier is trained to recognize the “eyes” or “not-eyes” classes using a large number of examples taken from images of different people. A large number of tests have been carried out on different people, with different eye colors and dimensions, some of them wearing glasses and with no constraints on hair style and background. Experiments have been carried out both in the laboratory with varying light conditions such as with natural or artificial illumination coming from only one side of the person (a half of the face lit by the sun), and in real situations of people driving cars. It should be noted that the system does not work with people wearing sun glasses (dark lenses) that occlude the eyes. But this is also the great limitation of any eye detection system (visual and IR based approaches). The only restriction we impose is that the distance between observed people and the camera does not greatly change, since our algorithm requires the knowledge of an approximate range of radius for the iris. In our opinion, this constraint is not at all restrictive to build a visual system for driver inattention detection: in this context the distance between a camera placed on the dashboard and the driver cannot greatly change. Besides, during the acquisition of video sequences, people are allowed to look around, move the head and assume any expression. If the eyes are visible in the image the algorithm is able to detect their presence independently of the gaze direction. Experimental tests demonstrate that, with a camera placed in the central part of the car dashboard, the eyes are clearly visible if the drivers are looking ahead or at the central rear view mirror, but if the drivers are looking in the side view mirror they are forced to rotate the head and the eyes are no longer visible. For this reason we suggest the use of two cameras placed on the dashboard as shown in Fig. 1. In this paper we report the results obtained on image sequences acquired with the central camera. But the integration with the results obtained with the second camera can be easily achieved.

The proposed algorithm for eye detection can be used for monitoring the driver alertness by evaluating the Perclos measure during a long period of observation. In this work we propose a different adaptive approach that can be used both to detect driver inattention and to model the normal behavior of each driver and recognize anomalous situations. Two concurrent procedures evaluate the eye occurrences in different observation windows. The first one continuously evaluates the time the driver is looking away from the road. If the number of frames with not-eyes is greater than a fixed threshold an alarm signal is set off. The detection of more complex inattentive behaviors such as with people who shift their attention from the primary task of driving or people impaired by the increase of fatigue levels, requires longer observations of the eye occurrence and above all requires the analysis of behavioral parameters that can be different for different drivers. For this reason we propose a concurrent procedure that examines the eye occurrence in an initial period of observation, during which a good attention level is assumed, learns some statistical parameters and builds a probabilistic model that characterizes the normal driver behavior. In our opinion driver behaviors can be modelled not only by considering the percentage of time a driver's eyes are closed but evaluating the eye closure duration together with the eye closure frequency. These two parameters contain more information than the percentage parameter: under the same percentage value different behaviors can be recognized with different combinations of eye closure durations and eye frequencies. A multivariate Gaussian mixture model was used in this work to characterize the two considered parameters. After the training phase, the same parameters are extracted from other observed sequences and the probability of having a normal behavior is evaluated: low values are indicative of anomalous situations and then an alarm signal can be set off.

The main contribution of this work is the introduction of a reliable eye detection approach that does not impose any constraints on the background and does not need any preprocessing to segment eye regions. It searches for the eye regions in the whole image, can work in sunlight and it performs well also in cases of people wearing glasses. In addition, the use of a learning phase to extract the parameters characterizing the normal behavior of each driver avoids the difficult problem of building a general model that has to be valid for any driver.

The rest of the paper is organized as follows: Section 2 describes the eye detection algorithm. The behavior analysis approach is detailed in Section 3. The results of different experiments for testing both the eye detection algorithm and the behavior analysis are reported in Section 4. Finally, in Section 5 conclusions are presented.

Section snippets

Eye detection

The eye detection algorithm consists of three steps. Initially, an iris detection algorithm, which uses a modified Hough transform, is applied to the whole image to detect the candidate region that might contain one eye. Then a symmetrical region is searched for in limited areas of the image. Finally an eye validation algorithm, based on a neural classifier is used to confirm the eye presence in the image. If the eyes are correctly found their position is used in the successive image to limit

Behavior analysis

Detecting the driver attention level with a visual system is a difficult task. An automatic system should consider many factors. The occurrence of the eyes in the image sequences needs to be evaluated to monitor the eye closure rates. Many studies into human sleep have been carried out to understand the physiological changes, such as accelerated respiration, decreased brain activity, eye movement, and muscle relaxation. The gaze direction needs to be evaluated in order to understand if people

Experimental results

The experimental phase consists of two parts. First of all, the eye detection algorithm was tested, using both image sequences of different people taken in our laboratory in different light conditions and image sequences of people while driving a car. Then the algorithm for behavior analysis was tested. The eye closure parameters of two people were extracted from a long image sequence during which a normal behavior was assumed. Then the statistical model was generated and used to recognize

Conclusions

In this work we propose a visual framework that, together with other kinds of sensors, can be used for warning against driver inattention. Many research groups are working on the possibility of providing the car with sensors for monitoring the steering wheel, brakes, accelerator, lane keeping and so on. The prevailing method for detecting driver inattention involves the tracking of the driver's head and eyes. Active IR illumination approaches have been largely used for detecting eyes and are

Acknowledgment

The authors would like to thank Neil Owens for his helpful and critical suggestions during the preparation of this manuscript.

About the Author—TIZIANA D’ORAZIO recevied the degree in Computer Science from the Unviersity of Bari (Italy) in 1988. Since 1989 she has worked at the Institute of Signal and Image Processing of the Italian National Research Council (CNR). She is actually a researcher of the Institute of Intelligent Systems for Automation of the CNR. Her research interests include pattern recognition, artificial intelligence, image processing for robotic application and intelligent systems.

References (39)

  • G. Yang, Y. Lin, P. Bhattacharya, A driver fatigue recognition model using fusion of multiple features, in: IEEE...
  • S. Park, M. Trivedi, Driver activity analysis for intelligent vehicles: issues and development framework, in:...
  • ...
  • ...
  • Y. Matsumoto, A. Zelinsky, An algorithm for real time stereo vision implementation of head pose and gaze direction...
  • ...
  • E. Wahlstrom, O. Masoud, N. Papanikolopoulos, Monitoring Driver Activities ITS Institute, Research Reports No. CTS...
  • Z. Zhu et al.

    Eye and gaze tracking for interactive graphic display

    Mach. Vision Appl.

    (2004)
  • Z. Zhu et al.

    Real time non intrusive monitoring and prediction of driver fatigue

    IEEE Trans. Vehicular Technol.

    (2004)
  • Cited by (174)

    • Two-stage aware attentional Siamese network for visual tracking

      2022, Pattern Recognition
      Citation Excerpt :

      Visual object tracking serves as a fundamental task in computer vision and receives increasing attention in recent years. Given the initial state of an object in the first frame, object tracking aims to predict the object's state in the subsequent frames, which is an important step for various applications ranging from autonomous driving [1], visual surveillance [2], augmented reality [3] to human-computer interaction [4]. However, accurate and robust tracking still remains challenging because of the complex shape and appearance variations of the object, such as occlusion, illumination change, background clutter, deformation, etc.

    • Recognition of visual-related non-driving activities using a dual-camera monitoring system

      2021, Pattern Recognition
      Citation Excerpt :

      Since object segmentation is needed in this proposed framework, the Mask R-CNN [21] is used as the part of the proposed framework. The eye gaze features have been applied in some applications of advanced driver-assistance systems (ADAS) for the purpose of distraction and fatigue detection [22,23] or gaze attention estimation [24]. The developed gaze estimation systems mainly focus on the modelling of the eye-gaze based on the image captured by the camera, which is in front of the human face [25,26].

    View all citing articles on Scopus

    About the Author—TIZIANA D’ORAZIO recevied the degree in Computer Science from the Unviersity of Bari (Italy) in 1988. Since 1989 she has worked at the Institute of Signal and Image Processing of the Italian National Research Council (CNR). She is actually a researcher of the Institute of Intelligent Systems for Automation of the CNR. Her research interests include pattern recognition, artificial intelligence, image processing for robotic application and intelligent systems.

    About the Author—MARCO LEO was born in Gallipoli, Lecce, Italy in 1974. He received the degree in Computer Science Engineering from the University of Lecce in 2001. Since then, he is a collaborator of research at the Italian National Research council (CNR), Institutue of Study of Intelligent Systems for Automation (ISSIA) in Bar, Italy. He is currently a researcher of the ISSIA. His research interests are in the area of image and signal processing, neural networks and pattern recognition.

    About the Author—CATALDO GUARAGNELLA was born in Italy in 1964. He graduated in electronic engineering in 1990 at University of Bari, Italy, and received the Ph.D. degree in Telecommunications by the Politecnico di Bari in 1994. In 1996 he joined the Electrics and Electronics Department of Politecnico di Bari as an assistant professor in Telecommunications. His main research interests include signal and image and video processing/coding, motion estimation in video sequences and multidimensional signal processing.

    About the Author—ARCANGELO DISTANTE received the degree in Computer Science from the University of Bari, Italy in 1976. He joined the National Nuclear Physics Institute until 1983 where he worked on various theoretical and computation aspects of 3D reconstruction and pattern recognition of nuclear events. Since 1984 he has been working with the Institute for Signal and Image Processing (IESI) of Italian National Research Council (CNR). Currently he is the coordinator of the Robot Vision Group and the Director of the Institute of Intelligent System for Automation (ISSIA-CNR). In 1996 he joined the University of Lecce where he is an associate professor in Theory and Practice of Image Processing at the Faculty of Engineering. His current research deals with computer vision, pattern recognition, machine learning, neural computation, robot navigation and architectures for computer vision. Dr. Arcangelo Distance is a member of IAPR and SPIE.

    View full text