Augmented Reality / Wearable Computing Facial Recognition

Hello, my name is Christopher Mitchell, and I'll be presenting my Master's thesis Applications of Convolutional Neural Networks to Facial Detection and Recognition for Augmented Reality and Wearable Computing Facial recognition and detection are two widely-researched areas in the field of image processing. Facial detection is the task of taking a large image, such as a video frame or a photograph, and finding the approximate location and scale of any face that may be in the picture. Facial recognition is a related task, wherein an image that already has been found to contain a face is examined to determine the identity of each individual pictures, or whether the individual is unknown. Augmented Reality refers to systems that overlay digital data on top of a reproduction of the real world, such as a video feed. Wearable Computing systems are often used to implement Augmented Reality systems. Wearable computing devices are generally lightweight, have a long battery life, and are as powerful as possible. They use unique input and output devices, since a computer that is carried around could not be used with a traditional keyboard, mouse, and monitor. My project, informally termed "Scouter" after a popular Japanese television show, takes video from a webcam, and detects and recognizes faces within the video feed, augmenting the detections onto the feed and displaying it to the user in a heads-up display. Scouter consists of a hardware and a software component. The software component was by far the most complex of the two to design and develop. Extensive researched indicated that the best method to use for facial detection would be a Convolutional Neural Network, also called a CNN. CNNs are modeled after the structure of the vision center of the human brain. A CNN is generally very robust to changes in the environment, such as lighting, face rotation and pose, skin color, and accessories such as eyeglasses and mustaches on individuals. CNNs are trained by creating a large database of face and nonface images, and showing it to the CNN to teach it to differentiate between the two. After the system finds each face in each input frame with the convolutional neural network, it uses something called a Haar Feature Cascade to find the left eye, right eye, and mouth of each individual. This allows every face to be normalized to the same scale and rotation, after which the Fisherface algorithm is used to identify the individual pictures by each face. The hardware is also unique, and uses a commodity netbook, an eeePC. The system is lightweight, about 2.5 pounds. It is powerful, using a 1.6GHz Intel Atom processor and 1GB of memory, and has a long battery life, three to four hours under normal operation of the Scouter system. For input, I use a commodity webcam, mounted to the front of a heads-up display, the output device, which contains two small LCD displays in front of the left and right eyes of the user. One of the primary goals of the system is to perform *real-time* detection and recognition, and to this end, the system succeeds. It was found in experimentation that it can achieve at least 6 frames per second of augmented video, as well as recognizing and detecting faces in at most 3.25 seconds between the face being seen on the camera and the recognition details being augmented onto the video feed. It was found that 95% of the total processing time is spent on facial detection, whereas less than 1% is spent performing facial recognition. In the future, algorithmic and technological improvements will be examined to determine ways to reduce this processing time. In conclusion, I created a system that can successfully detect and recognize faces at high accuracy and high speeds in a wearable computing system, using state-of-the-art algorithms. I have considered many possible applications of the system, primarily those related to individuals with memory impairment or physical disabilities such as visual impairment or total blindness, and how the system could help them. For example, in individuals with memory impairment, the system could learn those that are important to this user, and overlay information about the user such as their name and their significance on top of the video feed. For blind or partially- people with partial vision loss, the system could either show or speak the names and details of those around them. I also looked at how the system could be used for law enforcement, in identifying wanted individuals in a large group or crowd, and for medical applications, such as emergency room doctors and EMTs, that need to quickly pull up medical records of individuals, even if they are unconscious or have no identification. In the future, I'll look at how faster computing hardware and algorithmic improvements for facial detection and recognition can be used to make the system faster and more affordable. Thank you.