The Multimedia Commons is a collection of audio and visual features computed for the nearly 100 million Creative Commons-licensed Flickr images and videos in the YFCC100M dataset fromYahoo! Labs, along with ground-truth annotations for selected subsets.The International Computer Science Institute (ICSI)andLawrence Livermore National Laboratoryare producing and distributing a core set of derived feature sets and annotations as part of an effort to enable large-scale video search capabilities. They have released this feature corpus into the public domain, underCreative Commons License 0, so it is free for anyone to use for any purpose.

This data set has known applied use in emergency management, commercial market research, and other fields. More broadly, this dataset could be useful for the next generation of computer vision, human mobility, machine learning, and social computing research.

AWS has made the images, videos, feature corpus and annotation sets for the Multimedia Commons freely available on Amazon S3. Now anyone can use the data on-demand in the cloud without worrying about storage costs and download time.

The Multimedia Commons feature data and subset annotations along with the original images and videos are publicly available in the Multimedia Commons Amazon S3 bucket:http://multimedia-commons.s3.amazonaws.com/. You can browse the data set using theindex page. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs or AWSCLI(Command Line Interface).

multimedia-commons |-- data | |--images :images in jpg format | `--videos :videos in mp4 format |-- features | |--audio :aural features extracted from the audio tracks of the videos | |--image :visual features extracted from the static images | `--keyframe :visual features extracted from keyframes (one image per second) for the videos |-- subsets | |--YLI-GEO :a subset of the YFCC100M with features, used for theMediaEval Benchmark’s Placing Task | `--YLI-MED :annotations for a subset of the YFCC100M specialized for Multimedia Event Detection, with features `-- tools `--audioCaffe :a demo of a deep neural net-based audio content analysis tool `--etc :templates, scripts, and other useful miscellanea

You can access the original metadata viaYahoo! Labs. Note that Flickr users may choose to delete their content at any time, so a small subset of images and videos in the YFCC100M were no longer available when the images and videos were uploaded to the AWS Public Data Set. If you require a full snapshot of the original image and video content, please email multimedia-commons@icsi.berkeley.edu for more information.

AudioCaffe is acontent-analysistool based on deep neural networks. The demonstration experiment included with the dataset uses audioCaffe to analyze data from the YLI Multimedia Event Detection (MED) subcorpus. This demonstration gives you a taste of what you can do with a big corpus ofcomputedaudio features like the Multimedia Commons and a flexible set of analysis tools. You can get started right away with the AWS CloudFormation template and documentation availablehere. You can also get the current build of audioCaffe at GitHub or read more about the project at theSMASH web page.

ICSI and LLNL have released the feature corpus and annotations under Creative Commons 0 (public domain), so there are no restrictions on use. More information on licensing and citation of the original metadata and the underlying images and videos is available from Yahoo! Labs.

Educators, researchers and students can also apply for free credits to take advantage of the utility computing platform offered by AWS, along with Public Datasets such as Multimedia Commons on AWS. If you have a research project that could take advantage of Multimedia Commons on AWS, you can apply for AWS Cloud Credits for Research.