Introducing: Azure Media Indexer

Introduction

Internet video is growing at an extraordinary pace – The Cisco VNI Forecast expects that 70% of all consumer Internet traffic will be video content in 2014, rising to 79% by 2018. Already comprising a majority of the Internet traffic seen across the globe, video content is taking over the World Wide Web, and out of this growth arises the problem of discovering content. The Internet was designed around text-based documents, and as such, has mature infrastructure to encourage and enable the search and discovery of text across the entire web. Video files, on the other hand, are not natively “searchable”, and usually require complex classification systems primarily powered by massive amounts of manually-tagged metadata. But what if there was a way to extract this kind of meaningful metadata automatically? Azure Media Indexer is a media processor that leverages natural language processing (NLP) technology from Microsoft Research to make media files and content searchable by exposing this meaningful metadata to the end-user automatically in the form of a keyword file (XML), a set of closed caption files (SAMI/TTML), and a powerful binary index file (AIB). With the growth of multimedia comes an increased focus on the accessibility of video content to users with hearing impairment. The status quo is for all videos to be manually transcribed at high costs in order to create closed caption tracks. Azure Media Indexer’s speech recognition engine automatically creates a time-aligned subtitle track for any English spoken words in the input media file. This transforms an arduous, manual process requiring numerous man hours into an automated job. By utilizing the output files of Azure Media Indexer in conjunction with a search engine like SQL Server or Apache Lucene/Solr, developers can create a full-text search experience. Users will then be able to simply search content libraries with a text query, and get back a page of results which can seek to the timestamp in which the word is uttered. This deep integration of metadata and videos enables high-quality scenarios that reduce the friction between search of vast content libraries and the desired results. The implementation of this search layer is out-of-scope for this blog post, but look for upcoming posts on the Azure blog detailing how to create a search portal for your media files using Azure Media Indexer.

Indexing Your First Asset

With Azure Media Indexer, users can run indexing jobs on a variety of file types either from their local file system or from Azure Media Services. For your first Azure Media Indexer job, you will start with a file from your local disk, upload it to Azure Media Services, and process it in the Azure cloud. For this tutorial, let’s use this sample Channel9 video. Save the MP4 file to your computer and rename it to Index.mp4. Let’s assume for the purpose of this tutorial that your target video file can be found at the following path: “C:\Users\<<USERNAME>>\Videos\Index.mp4”. The completed sample project can be downloaded here.

Creating an Asset

An asset is the Azure Media Services container for media files. An asset contains the media file itself, along with any other required files such as manifest files for streaming or thumbnail files for previewing. In this case, you are going to create an asset file that holds your video file using the .NET SDK. You can also upload assets using the Azure Management Portal. Media processing jobs take an input asset and save the results into a specified output asset. First you need to import some dependencies and declare some constants that will come in handy inProgram.cs:

You will need to instantiate a CloudMediaContext object to establish a programmatic connection to the Media Services cloud. This will allow you to upload the file by first creating a new Asset and then uploading the file as an AssetFile within the Asset. First, add the following lines to your Main function, specifying where to find the video, and where to put the output files:

Note: if you want to use the same paths as this example project, replace <<USERNAME>> with your local Windows username

Submitting an Indexing Job

With your file now in the Azure Media Services cloud as an Asset, the next step is obtain a reference to the Azure Content Indexer media processor and create the Job itself. Jobs on Media Services are made up of one or more tasks that specify the details of a processing operation (encoding, packaging, etc.). Tasks optionally take a task configuration file specifying details about the task itself. In this instance, you will create an Indexing task on your new Asset using an optional configuration file called “default.config” containing some useful metadata, explained below.

Task Configuration

A task configuration file for Azure Content Indexer is an XML file containing key-value pairs which improve the speech recognition accuracy. In this release of Azure Media Indexer, the configuration details are able to describe the title and description of the input media file, allowing the adaptive natural language processing engine to augment its vocabulary based on the specific subject matter at hand. For example, if you have a video about Geico, it may be useful to include this term in your task configuration file. This will reduce the likelihood of a transcription of “guy co” in place of the desired proper noun “Geico”. Furthermore, if you have a title including the term “hypertension”, for example, the engine searches the Internet for related documents with which the language model can be further augmented. This will reduce the likelihood that the spoken term “aortic aneurism” will be misinterpreted as something unintelligible like “A or tick canner is um,” greatly increasing the accuracy of your output files.

Note: Best results are achieved using 4-5 sentences spanning the title and description keys.

Create a new config file by right-clicking the Project, clicking Add > New Item, and choosing XML file. Paste the following text into the new file, and save it as “default.config”. In this case, use the information from the Channel9 website to add the optional “title” and “description” keys in the configuration file to increase your accuracy:

In this post, you simply downloaded all of these files to a local folder. In future blog posts, you will explore the specific usage scenarios of these various outputs. At a high level, the SAMI and TTML files contain structured data about the words spoken along with their timestamps in the video, and can be used as rough-draft captioning of the video. The keyword file contains algorithmically-determined keywords form the input video along with their confidence level. The AIB file contains a binary data structure which describes the same data as the SAMI and TTML files, along with extensive word alternatives for words whose transcription was not 100% confident. This enables rich search functionality, and can greatly increase the accuracy of your output. In order to use the AIB file, you will need a SQL Server instance with the Azure Media Indexer SQL Add-on. Feel free to reach out to us with any questions or comments at indexer@microsoft.com. Read Part 2 of this blog series on Azure Media Indexer to learn more about this scenario.

Other Information

While this blog post was designed to introduce Azure Media Indexer, it does not cover all of the usage scenarios. For example, you can submit jobs with a manifest file to support the indexing of multiple files.

Indexer is best utilized for scenarios optimizing for accuracy rather than speed, taking approximately 3 x (input duration). This is suboptimal for scenarios that require near real-time results.