State-of-the-Art Computer Vision Technologies

After you upload a video to YouTube, one thumbnail image will be displayed linking to your video. A thumbnail image is usually small but it is very important. It delivers the first visual impression of your video to audience browsing millions of videos on the web.

How does YouTube generate thumbnails for videos?

According to YouTube’s blog, YouTube’s previous approach to video thumbnail generation is to provide three thumbnails, which are auto-generated from the 20%/50%/75% points in the video index. However, the three auto-generated thumbnails may not be representative to the video content. Obviously something smarter can be done to improve the process.

Recently, YouTube released a smart thumbnail generation feature, which generates a set of images that are visually informative of the video content using computer vision and video analytics algorithms. This reminds me the “text thumbnails” for web content or “image thumbnails” for images. For example, Google displays your web search “thumbnails” in a similar way, and Adobe’s Content-Aware Image Resizing could be used for automatic image thumbnail generation in image browsing.

In terms of technology for YouTube’s video thumbnail generation, a simple image/video color histogram would do a decent job. More advanced computer vision algorithms, such as human action clustering/recognition/categorization, salient motion detection, audio-visual analysis, or face recognition, could generate more robust results. However, due to their high computational complexity and the huge amount of video data (13 hours of video per minute to YouTube), I tend to believe YouTube is not using these advanced features yet. It could be an interesting project for computer vision graduate students to think about efficient and robust approaches for automatic video thumbnail generation.