Rosetta is here – Facebook’s AI system to identify contextual content on images

Facebook is creating an AI system called Rosetta to identify text and content on pictures.

Several pictures are shared on Facebook and Instagram daily. It might be overlaid on an image in a meme, or inlaid in a photo of a storefront, street sign, or restaurant menu. Taking into account the sheer volume of photos shared each day on Facebook and Instagram, the number of languages supported on the global platform, and the variations of the text, the problem of understanding text in images is quite different from those solved by traditional optical character recognition (OCR) systems, which recognize the characters but don’t understand the context of the associated image.

To address these specific needs, Facebook built and deployed a large-scale machine learning system named Rosetta. It extracts text from more than a billion public Facebook and Instagram images and video frames (in a wide variety of languages), daily and in real time, and inputs it into a text recognition model that has been trained on classifiers to understand the context of the text and the image together.

Rosetta is based on two important elements: detection & recognition. “The first step, we detect rectangular regions that potentially contain text. In the second step, we perform text recognition, where, for each of the detected regions, we use a convolutional neural network (CNN) to recognize and transcribe the word in the region,” Facebook’s official blog read.

Facebook would also face many challenges to run this procedure like different languages as it’s available globally in many countries, the interconnection between the text and image and if it’s disrespectful/vulgar or just for entertainment purpose.

If it accomplishes the desired aim by running this operation, it would eradicate offensive content to a great extent. But a single glitch in the system may turn the tables. Down the line, Facebook could leverage this data to spot trending content for ad purposes.

A wordsmith by profession who likes art, Beatles, Coffee, DiCaprio and Eminem brings a fresh perspective contrary to existing perceptions and believes in questioning everything, also has a belief that there should be a bigger place in the world for words and not war. He prefers the phenomena of the physical world of plants and animals(especially dogs) over possessions. Reckons moving to the rhythm and moving on.