Text Detection in Images using Deep Learning

Text detection in natural scene images is an important preprocessing for many content-based image analysis tasks. Deep learning is a set of brain-inspired algorithms that involves deep multi-layered neural networks. These neural networks are trained to find a set of features that represent the fed data, thus allowing machine learning and computer-vision usages such as classification and detection. This approach is the state-of-the-art in the field of computer-vision, voice recognition and natural language processing, and used by Google, Microsoft, Yahoo and more.

Text detection in the last few years has become one of the biggest problems to solve in the field of computer vision. The classical ways to detect features in images in computer vision, such as SURF or SIFT, have shown their limits, given the fact that text is present in different scales, colors, and fonts on the images, and hence it is very hard to find common characteristics within all kinds of text. Furthermore, as we know, the human being detects text very easily, which hints that the detection system should be alike the human brain, constituted of neurons, and the system should be able to improve itself with proper training. This technique of detection, known as Deep Learning, is already known as a technique providing state-of-the-art results in the field. Our approach in this project derivate for classical Deep Learning systems, since our goal was to establish classification at the pixel level and not simply detect text zones in images, as it was previously done. Consequently, the establishment of a database constituted of examples with labels at the pixel level was mandatory. This database allowed us to train our network and obtain satisfying results.