Content is an eminent example of the features that contributed to the success of wireless Internet. Mobile platforms such as the Snapdragon™ processor have special hardware and software capabilities to make acquisition, processing and rendering of multimedia content efficient and cost-effective.
In this course, you will learn the principles of video and audio codecs used for media content in iTunes, Google Play, YouTube, Netflix, etc. You will learn the file formats and codec settings for optimizing quality and media bandwidth and apply them in developing a basic media player application.
Learning Goals: After completing this course, you will be able to:
1. Explain the tradeoffs between media quality and bandwidth for content delivery.
2. Extract and display metadata from media files.
3. Implement and demonstrate a simple media player application using DragonBoard™ 410c.

JC

excellent, I like the digital image processing, and the use of different free software tools for it.

MI

Mar 03, 2017

Filled StarFilled StarFilled StarFilled StarFilled Star

So much I have learned in this course and so thankful to UCSD and Coursera.

Na lição

Codecs

In this module our esteemed Professor Harinath Garudadri will talk about coders and decoders (Codecs). This will allow us to make better use of our multimedia choices when working with the DragonBoardTM 410c. We want to look at the motivation behind using Codecs, the different ways to take advantage of redundancies when using codecs and finally the ability to take advantage of different receiver / transmitter combinations. If we are able to understand the way that information is sent and received over the data plane we can create and use the right codecs.

Ministrado por

Harinath Garudadri

Associate Research Scientist

Ganz Chockalingam

Principal Engineer

Transcrição

By image processing, we normally mean, still images. And the one that most people are familiar with is JPEG. And it's developed by Joint Photographic Experts Group, at ISO and IEC, those are the international standards for this. There is a good review on the Wiki here. I just want to map the JPEG algorithms in the general scheme that we have been working on so far. [COUGH] So, before we jump into JPEG, I would like to talk briefly about the color representation. Most camera sensors, they detect R, G, and B. Red, green, and blue. And we need eight bits per pixel. Now, there are new sensors that have 10 and 12 bits for very high dynamic range. It's very advantageous to work in a different representation than in the RGB space. And this representation is called YUV. Y stands for luminance. U and V are the chrominans. This is also called CR and CB, YCR, YCB. You can go from RGB to YUV space, back and forth, and the only loss is the rounding off errors. The brightness is represented by Y, the luminance component. It has two main advantages. One is you get comparability with black and white TV. We basically process Y, U and V separately, and colored TVs, used the additional chrominance data. Black and whites just used the dominance data. The second advantage is, human perception. We are very sensitive to brightness. So this gives us an opportunity to work on the luma a lot more carefully, and in fact the U and V can also be down-sampled with very, very little loss in the visual representation.. The YUV444 is the unsampled domain, YUV422, the U and V, are down-sampled by a factor of 2. So, this gives us right off the bat about 30% data savings. There is also YUV420 that gives about 50% data reduction. So, the picture on the left that you see has, the color picture followed by the Y, C, U, and V. And the Y component always has this prime that is gamma corrected luminance, that's called a luma, but even if we dropped the prime, it still is understood that it is the luma representation. The table here gives some values and correlation between the YUV values and the RGB values. I put in a very nice Wiki link that explains this in very nice detail. So in JPEG, what we normally do is we take the image and break it into eight by eight blocks of pixels, and in each eight by eight block we do discrete cosine transport. That's the time frequency decomposition that we've talked about here. It's just a spatial to frequency decomposition. So visually, the picture on the left is what the DCT basis functions would look like. The cell in the top-left corner is just the DCT component, and the cell on the right-hand bottom component is the very high frequency component, including the horizontal frequency terms and the vertical frequency terms. And the picture on the right-hand side is once we do the DCT transform for each of the blocks, we scan all the DCT terms to make a linear vector. And when we do this, there's lots of terms that are typically zero, so we look at only the components that have significant value, and record only them, using Huffman coding. So here is the block diagram of the complete JPEG encoder and decoder. You can see the similarity between this and the generic block diagram that we have been looking at before with a lot of the optional components removed. After the entropy coding, as I said, there are lots of terms that are usually zero for most of the natural images. So we use what is called run-length encoding, RLE, to represent a large sequence of zeros. So, other than that, we have basically looked at all the blocks on the encoder and the decoder. Just like what we did with the Miracle 13 DV demo, it is very instructive to see the noise that we are introducing during the encoding and reconstruction process. Depending on the encoder settings, we can get different reconstruction quality. And in this particular one, the image on the right-hand side is the one, the delta, the difference with a quality setting setting of about 50. This, for most images, roughly gives about 15 to 1 compression ratio and you can see the kind of distortion that we introduce. But when you put it on top of the real image, most of the times, they are not very objectionable. This is the psycho-visual aspects of human perception.