For the task of Driver monitoring, BRAVE had to tackle several partial problems. The first task was to detect the driver’s face. For that purpose we took two different approaches. One of them based on a Histogram of Oriented Gradients + Support Vector Machine (HOG+SVM) and the other one based on the Single Shot Detector (SSD) using a ResNet base network. Making a comparation between the two methods, the first one turned out to be a bit more stable but the second one is much faster. Hence, we decided to implement the second option.
Once we had the face detected, we made use of the pre-trained facial landmark detector inside the dlib library to estimate the location of 68 (x,y)-coordinates that map to facial structures on the face. This facial landmark detector is an implementation of the One Millisecond Face Alignment with an Ensemble of Regression Trees paper by Kazemi and Sullivan.
With this set of points we can extract the position of the eyes in each frame. As we can observe in the image above, there are 6 points per eye, delimiting the eyelids. Using the distances between these points (horizontal and vertical distance) we can then calculate the Eye Aspect Ratio (EAR), with which we can detect when the eye is closed or open, hence we can easily calculate the blink frequency, PERCLOS (percentage of eye closure), amplitude of the eye opening, etc.
On the other hand, using the set of points delimiting the lips, we know when the driver is yawning. Furthermore, we can measure the initial face structure in order to follow the head position.
The next block diagram synthesize the algorithm flow:
The last task to address would be the gaze estimation. For this task we need to take a different approach, since we cannot detect the pupils with the previous tools. As we saw, we can only estimate roughly the gaze direction making use of the facial structure, obtaining nine sectors in the frame. But the driver may have the head fixed and be looking to a different position.
Hence, the proposed method is to make use of a CNN, which is the task in which we are currently working, in order to get a more accurate gaze estimation.