CASIR Gaze Estimation Database

Open-source gaze and head pose estimation database.


Head movements are commonly interpreted as a vehicle of interpersonal communication. For example, in daily life, human­beings observe head movements as the expression of agreement or disagreement in a conversation,or even as a sign of confusion. Gaze direction can also be used for directing a person to observe a specific location. As artificial cognitive systems with social capabilities become more and more important due to the recent evolution of robotics towards applications where complex and human­like interactions are needed, gaze and head pose estimates become an important feature.

During the CASIR project we developed a robust solution based on a probabilistic approach that inherently deals with the uncertainty of sensor models, but also and in particular with uncertainty arising from distance, incomplete data and scene dynamics. This solution comprises a hierarchical formulation in the form of a mixture model that loosely follows how geometrical cues provided by facial features are believed to be used by the human perceptual system for gaze estimation.

To evaluate our solution, a set of experiments were conducted so as to thoroughly and quantitatively evaluate the performance of the hierarchy. The experiments were devised to test system performance for many different combinations of head poses and eye gaze directions, but also for different distances in a dynamic setting context. Eleven subjects (3 female and 8 male, 8 with white complexion and 3 with dark skin colour), with ages ranging from 8 to 43 and heights ranging from 1.3m to 1.89m, were instructed to fixate a target placed in a position with known 3D coordinates throughout each trial.

a) isr



Experimental setup. In (a), the experimental plan and protocol are presented, showing the sensor rig mounted on a robotic head, the Optotrak Certus used to obtain 3D ground truth and the gaze target. In (b), a photo of the actual experimental setup and the Kinect 3D sensor is shown – note that, as opposed to the schematic in (a), the targets (three coloured stripes, respectively at 1.43m, 1.57m and 1.68m from the ground) are on the same side as the Optotrak. This is because the Optotrak can only track its optical devices within a “cone of visibility” (markings on the floor show the cone’s limits), therefore constraining the usable space available for the experiments to take place. The subjects were asked to travel from a predefined distance at about 3m towards about 1.5m from the Kinect sensor, following a relatively free path within that usable area always keeping their gaze fixed on one of the targets throughout each trial. They were also instructed to stop in three different midway positions along the path, and at those positions slowly turn their head up, down, left and right, without losing their gaze focus on the designated target.

The following features are extracted by the detectors, and are used as inputs:
  • rough estimate for head orientation (RH);
  • the 5 features relating to the main facial anatomical landmarks: the sellion (mid-point between eyes), the tip of the nose, the left and right pupils.
  • the 2 pairs of eye extremities (i.e. the corners of the eyes).

A Northern Digital Inc. (NDI) Optotrak Certus motion capture system was used to obtain 3D ground truth pertaining the position of the subjects and their head pose throughout the experiment, which was calibrated with the Kinect for registration. The data processing flow is shown below:



The database is composed of 9 different data types, stored using ZIP compression.

Data Type

  • RGB images – Stored on a folder in .png format.

  • Depth images – Stored on a folder in .pcd format (point cloud data).

  • Gaze vectoy – "Ground truth" 5D gaze vector (sellion position,phi,theta_head) in .txt format.

  • Sellion position – "Ground truth" head center 3D position (x, y, z) in .txt format.

  • Fixation point – "Ground truth" point of fixation 3D position (x, y, z) in .txt format.

  • Phi angle – "Yaw" gaze vector orientation in .txt format.

  • Theta angle – "Pitch" gaze vector orientation in .txt format.

  • 2D feature positions – "Hand labeled" 2D facial features in .txt format.

  • 3D feature positions – "Hand labeled" 3D facial features in .txt format.

  • Experimental setup – Transformation matrix, rotation matrix and the translation vector used in each session in .txt format.


The CASIR Gaze Estimation Database is made available under the Open Database License v1.0. Any rights in individual contents of the database are licensed under the Database Contents License v1.0.


Session 1
Session 2
Session 3
Session 4
Session 5
Session 6
Session 7
Session 8
Session 9
Session 10
Session 11


If you have any questions regarding the database, please send an e-mail to: Cláudio Araújo, João Filipe Ferreira or

Pablo Lanillos, using the following form:


Version Description


Date: 2015.06.16
Updated the information provided in: introduction and structure sections, updated addicional files to the download section and applied some general bug fixes.


Date: 201.05.26
Initial release of the CASIR Gaze Estimation Database.