I propose STARE, a wearable system to perform real-time, simultaneous eye tracking and focused object recognition for daily-life applications in varied illumination environments. The proposed system extracts both the gaze direction and scene information using eye images captured by a single RGB camera facing the user's eye. In particular, the method requires neither infrared sensors nor a front-facing camera to capture the scene, making it more socially acceptable when embedded in a wearable device. This approach is made possible by recent technological advances in increased resolution and reduced size of camera sensors, as well as significantly more powerful image treatment techniques based on deep learning.
First, I describe a model-based approach to estimate the gaze direction using RGB eye images. A 3D eye model is constructed from an image of the eye by fitting an ellipse onto the iris. The gaze direction is then continuously track by rotating the model to simulate projections of the iris area for different eye poses and matching the iris area of the subsequent images with the corresponding projections obtained from the model. By using an additional one-time calibration, the point of regard (POR) is computed, which allows to identify where a user is looking in the scene image reflected on the cornea.
Next, I propose a method to recognize objects in the scene reflected on the cornea in real time using the gaze direction information. Deep learning algorithms are applied to automatically label and then recognize the focused object in the area surrounding the reflected POR on the eye image. Additional processes using High Dynamic Range (HDR) demonstrate that the proposed method can perform in varied illumination conditions.
Finally, I verify the validity of the approach experimentally with a 3D-printable prototype of a wearable device equipped with dual 720p cameras, and a high-sensitivity camera in extreme illumination conditions. Further, I set up a proof-of-concept implementation of a state-of-the-art neural network which showed that the focused object recognition can be performed in real time.
To summarize, the proposed method and prototype contribute a novel, complete framework to 1) simultaneously perform eye tracking and focused object analysis in real time, 2) automatically generate datasets of focused objects by using the reflected POR, 3) reduce the number of sensors in current gaze trackers to a single RGB camera, and 4) enable daily-life applications in all kinds of illumination. The combination of these features makes it an attractive choice for eye-based human behavior analysis, as well as for creating large datasets of objects focused by the user during daily tasks.