doi:10.1016/j.cviu.2004.07.009
Copyright © 2004 Published by Elsevier Inc.
An embedded system for an eye-detection sensor
aIBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA
bUniversity of California Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
cDepartment of EECS, University of California Berkeley, CA 94720, USA
Received 27 July 2004;
accepted 27 July 2004.
Available online 1 October 2004.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Real-time eye detection is important for many HCI applications, including eye-gaze tracking, autostereoscopic displays, video conferencing, face detection, and recognition. Current commercial and research systems use software implementation and require a dedicated computer for the image-processing task—a large, expensive, and complicated-to-use solution. In order to make eye-gaze tracking ubiquitous, the system complexity, size, and price must be substantially reduced. This paper presents a hardware-based embedded system for eye detection, implemented using simple logic gates, with no CPU and no addressable frame buffers. The image-processing algorithm was redesigned to enable highly parallel, single-pass image-processing implementation. A prototype system uses a CMOS digital imaging sensor and an FPGA for the image processing. It processes 640 × 480 progressive scan frames at a 60 fps rate, and outputs a compact list of sub-pixel accurate (x, y) eyes coordinates via USB communication. Experimentation with detection of human eyes and synthetic targets are reported. This new logic design, operating at the sensor’s pixel clock, is suitable for single-chip eye detection and eye-gaze tracking sensors, thus making an important step towards mass production, low cost systems.
Keywords: Eye detection; Eye-gaze tracking; Real-time image processing; FPGA-based image processing; Embedded systems design
Fig. 1. Processing steps of the basic sequential algorithm.
Fig. 2. System block diagram, showing the main hardware components and pipeline processing.
Fig. 3. Subtraction and thresholding.
Fig. 4. Four cases of overlap between two consecutive line components are identified by their start–end location relationships. This merging operates on two line-component lists (as opposed to pixel lines).
Fig. 5. An example in which two regions are merged into one at the time of processing the lower pair of lines. The middle region, which was first assigned with ID = 3, is merged with the region of ID = 1, their moments and bounding box properties are merged, and ID 3 is recycled.
Fig. 6. First the system behavior model is designed and tested. Then the hardware/software architecture is set, and the functions are mapped onto the selected architecture. Last, this model is refined and converted to actual system implementation.
Fig. 7. System prototype. The lens is circled with the on-axis IR LEDs. Most of the development board is left unused, except of one FPGA and a frame buffer.
Fig. 8. Results of one experiment. The detected (x, y) coordinates of four linearly moving targets along approx. 1500 frames are marked with blue ‘+’ signs (forming a wide line due to their high plotting density). The deviation from the four superimposed second degree polynomial fits (red) is hardly noticeable due to the achieved sub-pixel accuracy. See Table 1 for the calculated error rates. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this paper.)
Fig. 9. Synthetic target (‘+’ sign) and eye (‘.’ sign) detections are shown at 4 ft (left graph, total of 980 frames) and 6 ft (right graph, 870 frames) experiments. The eye-detection trajectory follows the head motion. For this subject, S13, head motion was much more noticeable at the second, 6 ft session. The synthetic targets are static and thus show no trajectory.
Fig. 10. One frame of the system’s monitor video output, generated by the FPGA, passed to an on board VGA converter, and projected at 60 fps on a white screen. This optional video channel is only used for debugging and demonstration purposes and is not required for regular operation. The gray-level input frame is directed to the red channel and detected pupils are superimposed as bright green marks. Image is captured by a digital camera from the projected screen. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this paper.)
Table 1.
Experimentation results with synthetic targets

Table 2.
Experimentation results with detection of human eyes and synthetic targets
