This course immerses learners in deep learning, preparing them to solve computer vision problems. Learners plunge into the field of computer vision that deals with recognizing, identifying and understanding visual information from visual data, whether the information is from a single image or video sequence. Topics include object detection, face detection and recognition (using Adaboost and Eigenfaces), and the progression of deep learning techniques (CNN, AlexNet, REsNet, and Generative Models.)
This course is ideal for anyone curious about or interested in exploring the concepts of visual recognition and deep learning computer vision. Learners should have basic programming skills and experience (understanding of for loops, if/else statements), specifically in MATLAB (free introductory tutorial: https://www.mathworks.com/learn/tutorials/matlab-onramp.html). Learners should also be familiar with the following: basic linear algebra (matrix vector operations and notation), 3D co-ordinate systems and transformations, basic calculus (derivatives and integration) and basic probability (random variables). It is highly recommended that learners take the “Deep Learning Onramp” course available at https://matlabacademy.mathworks.com/.
Material includes online lectures, videos, demos, hands-on exercises, project work, readings and discussions. Learners gain experience writing computer vision programs through online labs using MATLAB* and supporting toolboxes.
This is the fourth course in the Computer Vision specialization that lays the groundwork necessary for designing sophisticated vision applications. To learn more about the specialization, check out a video overview at https://youtu.be/OfxVUSCPXd0.
* A free license to install MATLAB for the duration of the course is available from MathWorks.

Enseigné par

Radhakrishna Dasari

Instructor

Junsong Yuan

Associate Professor and Director of Visual Computing Lab

Transcription

Optical Character Recognition or OCR, is a technique used to digitize handwritten or printed text from images. The digitized texts can be edited, searched, and stored in a compact manner on a computer. OCR is still an active area of research and computer vision. Although OCR works very well in control settings, like vehicle name plate recognition and mobile check deposits, it is still an open problem when it comes to recognizing text in the wild. Nevertheless, OCR is an easier problem to solve, relative to other recognition problems. Given sufficient training data, current state of the art techniques based on deep learning, can achieve high accuracy. When it comes to research and OCR, the MNIST Dataset is inseparable. The MNIST Dataset is an acronym for the Modified National Institute of Standards and Technology database. It is a large database of handwritten digits, commonly used for training various image processing systems. It contains 60,000 training images and 10,000 test images of 28 by 28 size. Almost every research pertaining to OCR, uses MNIST dataset to compare performance against the benchmark set by other OCR techniques. Let us perform optical character recognition using neural networks, on the MNIST data.