The code uses libavformat for demuxing, and then passes the video stream through an openmax decoder/encoder pipeline and writes the raw encoded video stream to the output file. No resizing or deinterlacing is done (the appropriate components would need plumbing in), and it doesn't remux the output stream.

See the comments at the top of omxtx.c for more info.

I'm happy to co-ordinate this project to try and turn this into a useful transcoding app, but neither I nor the original author have much time available for developing it further at the moment.

It would be very helpful if you could give a descriptive explanation of what is happening.

Something like:

1st the input is demuxed via libavformat ... part of code...2nd we decode a frame, put it in a buffer (what buffer, what size, why etc)3rd encode the frame from the buffer (or maybe a bunch of frames I am only guessing here)4th loop from 2nd until the last frame

Is it a frame-by-frame transcoding loop?Maybe some tutorial expanation of how the hardware acceleration works in the Rpi?

My interest is primarily to transcode it with a different (smaller) resolution. Where in the code would be the wise place to do that. Can that be done in hardware or do I need the libav?

I am not a professional programmer, and the OMX code is pretty hard to understand (I would say somewhat ill documented) so any comment on the code is welcome. Make it easy for us none rocket scientists.

It's a small lump of code I knocked up with about a week's effort over a period of about a month, to prove to me that it could be done, to see if it would be useful for my current contract (MPEG 2 -> H.264 in less than half real-time and 5W of power? Yes, please), and as an exercise in learning OpenMAX. I've not come across it before, despite my work involving a lot of media.

As such, the code is very much a first-pass, and not something I would normally release (in particular, the horrific #define at the top isn't something I usually write; it's there because it started out small and grew, and I didn't think to turn it into a proper function. It's on the list); there will be bugs, there are stray bits left over from testing approaches which didn't work out, and simple things like changing the output bitrate currently require a recompile. Not ideal.

So:

At the top of main() we declare and define a bunch of structures for configuring the various components. We open the video, using libavformat (why would you use anything else?), probe it, and look for a video stream. We only use the first video stream found; everything else is ignored and disposed of.

Anywhere you see 'OERR(something);', read it as 'do <something>, and if it errors, print the error code, line number, whatever 'something' is, and quit'; it's a handy wrapper to save a lot of typing. Obviously, in a proper program you want to handle errors more gracefully, but when playing with this sort of thing I find that if something's gone wrong no amount of attempting to fix it will help. Best to bail out.

So we initialise 'm2' (the decoder; 'm2' as it's MPEG 2 I'm intending to feed it) and 'm4' (the encoder, which will be producing MPEG 4 AVC (AKA H.264)). Next we look for the port numbers -- these are fixed, but worth probing for -- and disable them. Each device has an input and an output that is the input+1; this is in the spec somewhere.

We setup the vidddef struct with some basic frame info gleaned from the container, and pass it to the decoder. You'll see a lot of DUMPPORT() macros around; I got into the habit of doing these on a regular basis so I could check that what I thought the state would be actually was. This involves a fair bit of reading structure definitions.

Next we transition the decoder to the Idle state, and allocate some buffers for it. Then we transition to state executing: the decoder is now setup to do some actual decoding.

First loop.

Googling around a bit netted me https://www.daphne-emu.com:9443/mediawiki/index.php/OpenMAX which helpfully details what needs to happen here. We read a frame, encapsulate it (or as much as will fit) in a buffer, and send it to the hardware. We do this up to 120 times (which is just shy of 5s of video) to allow the decoder to fetch enough data to understand what it's dealing with. At some point, we hope that the thing will have announced that its output port has changed state; if it hasn't, we bail out as it doesn't think there's any video to decode. Potentially, if you have a file with some very long GOPs, and it doesn't start at the beginning of a GOP (eg. you've got a slice of broadcast video that's not quite within spec) 120 frames won't be enough. Typically you won't need more than 12.

You'll see some av_bitstream_filter stuff in there; if you feed it H.264 (to reduce the bitrate, for example), you may find that without that filter, the decoder won't recognise it; the decoder is expecting Annex B framing, and that's what calls to that patch up. You can ignore this if you're just using MPEG 2. I don't know about VC1.

After the decoder has announced it's output port has changed state, we get the state of it, and feed it, unchanged, to the encoder's input port. That configures the encoder with details like the frame size and how it's laid out in memory. Then we setup the tunnel between the decoder's output and the encoder's input, transition the encoder to Idle, and configure the bits we want: bitrate, colour format, and whatnot.

(The aspect ratio should be set here, too, but for some reason the values we need for Freeview broadcast 16:9 aspect ratio content -- 64/45 -- is invalid. I don't know why this is. Patch it up in the container).

For interest's sake, I now dump some information about the encoder: what it can encode to, etc.

Next we enable the relevant ports, set them to executing, and enter the main loop.

The main loop is basically the same as the first loop: obtain a frame from avformat, feed it to the decoder, wait for a frame from the encoder, write it to disc. Repeat until EOF. Really, this shouldn't be a copy of the first loop, but the setup guff between them should be in a function that's called once when the output port state changes.

The output is a file containing raw H.264 NAL units. It's playable with ffplay, but not much else (including, much to my surprise, mplayer).

Things to do:

* You say you want to scale the output: you'll want to bolt in the scaler between the decoder output and the encoder input; you'll need to do that between the two loops.

* Timings. At no point do I bother with timings, of any form. This is bad, and needs fixing. You cannot currently feed it video and remux it back into a container expecting the audio in sync.

* Consolidate the two loops. Really, that should come first.

* Remove the abomination that is DUMPPORT(). There is no universe in which I will accept things like that in production code.

* Rather than disposing of anything that isn't the first video stream, remux them back into a container with libavformat.

* Handle the EOF state more gracefully. ATM it just bombs out, but really a special last-frame buffer should be set through the pipeline, which will cause the encoder to dump the last frames of data.

* Do some commandline parsing, to set things like the bitrate and potentially the frame size if you're scaling.

port 60/61. So the decoder output must be tunneled to the resizer and then to the encoder.

Yup. You'll need to instantiate the resizer at some point. I'd do it near the start -- there's no point doing something vaguely complex some seconds of runtime later only to find it's wrong so you've got to do it all over again, wasting time -- then when the decoder has changed state, set the parameters of the output port to the input port of the resizer, and setup the tunnel between them. The bit I'm hazy on is when you should then plumb the output of the resizer to the input of the encoder; I'll leave that in your clearly capable hands.

Christmas is coming up, and if I get a chance, I'll tidy up the code as it is. Feel free to either email me patches or submit them via linuxstb's github thing (I need to read up on how that all works; it's under his account as I've never needed to work with it beyond a simple 'git clone' / 'git pull' before) and I'll integrate them into whatever I come up with.

Well the openmax definition is hard to chew, it's fudge. Lot's of things are pretty unclear.

As for your code I managed the aspect ratio bit. The values are restricted to 1:1, 10:11, 16:11, 40:33, 59:54, and 118:81. The closest to 16:9 being the latter. I did'nt actually had a chance to see the output, but at runtime I am getting no errors .UPDATE: It does work! (BTW I am using VLC)

The program queries the codecs from the encoder (as a dump). I have a MPEG2 license, yet that particular codec is not outputted by this dump. Why not?

At some point you query and set the profile and level of the encoder. I did not grasp what that does? What is the definition of profile and level? Looks like it has something todo with the encoder used, but that's already set in the eCompressionFormat is it not? The necessity eludes me.

I need some buffers to set up. However. What buffers do I need? Do I need buffers between the tunneled components? Do i need an input buffer and an output buffer? Very unclear for now.

And finally (well for now) What is the exact sequence of events I have to do to setup the tunneling. I read that you set them up in IDle state, There seems to be a required sequence to put them in enabled state (something to do with the buffers) and then again in the execution state.

Be assured that I had found that documentation. As I said, it is fudge. What I need is some straightforward examples, a descriptive whitepaper or a simple tutorial/101. But it looks like I am pioneering here.

As for your code I managed the aspect ratio bit. The values are restricted to 1:1, 10:11, 16:11, 40:33, 59:54, and 118:81. The closest to 16:9 being the latter. I did'nt actually had a chance to see the output, but at runtime I am getting no errors .UPDATE: It does work! (BTW I am using VLC)

I'm not surprised. I didn't bother setting it to the closest value as aspect is one of those things that if it's wrong it's just wrong. There's really no almost-right; it'll need fixing up in the container anyway.

Beach wrote:The program queries the codecs from the encoder (as a dump). I have a MPEG2 license, yet that particular codec is not outputted by this dump. Why not?

Because it doesn't query the codecs from the decoder, which is what the licence you have bought enables. I wanted to see if it could encode to anything else, and it can: 3GP is in there, from memory.

Beach wrote:At some point you query and set the profile and level of the encoder. I did not grasp what that does? What is the definition of profile and level? Looks like it has something todo with the encoder used, but that's already set in the eCompressionFormat is it not? The necessity eludes me.

Profiles and levels are codec settings, which constrain the encoder in certain ways: how many key frames may be present; the maximum / average bit rate; things like that. The details for H.264 are in Annex A of ISO 14496 part 10.

Beach wrote:I need some buffers to set up. However. What buffers do I need? Do I need buffers between the tunneled components? Do i need an input buffer and an output buffer? Very unclear for now.

And finally (well for now) What is the exact sequence of events I have to do to setup the tunneling. I read that you set them up in IDle state, There seems to be a required sequence to put them in enabled state (something to do with the buffers) and then again in the execution state.

Anyone who knows the answers: you cannot be elaborate enough :D

As Dom said, you need to read the OpenMAX specs from Khronos. The answers to your questions are in there, although not terribly well explained.

For now I'll struggle on and study (deep sigh) the openmax definition documentation. Before you plan to tidy things up, pls contact me, I used your code as a template and have added a fair bit already. Although not working at the moment -As said I have to understand the tunneling and buffer sequences- but it would be a waist not to use that.

I do know nothing about OpenMAX , but could you document yourfindings in the ELinux.org wiki ? It would be sad if beginners who want to do multimedia code would be set backjust because they have to redo everything you already accomplished !

ghans

• Don't like the board ? Missing features ? Change to the prosilver theme ! You can find it in your settings.• Don't like to search the forum BEFORE posting 'cos it's useless ? Try googling : yoursearchtermshere site:raspberrypi.org

Beach wrote:Before you plan to tidy things up, pls contact me, I used your code as a template and have added a fair bit already.

Apologies; I found I had a spare few hours this evening, so have attacked the code and beaten it into a nicer shape. It's on GitHub either at the above URL or https://github.com/dickontoo/omxtx, which will probably become the new master branch.

Beach wrote: Although not working at the moment -As said I have to understand the tunneling and buffer sequences- but it would be a waist not to use that.

Your code probably mostly slotted between the two loops. This should be moved to the new configure() function, which does all the stuff between the two loops that should have been in its own function and called as appropriate and now is. It's mostly untouched.

When you see it, I think you'll agree it's a better structure. I've removed one of the loops, stuck the configuration in its own function, removed the DUMPPORT() macro (turned it into a function, removed the side effects), removed the goto, and made a few other minor tweaks. Changing the bitrate should no longer require a recompile.

Guess what? I managed to tunnel the decoder<-->resizer<-->encoder and now have a working program that scales and transcodes my recorded video to a quarter of its original size (352x288). However it is based on the original omxtx.c, I have to put it in the new coded program. In the end it appeared to be easier than I thought, the GPU allocates the intermediate buffers if you set up a tunnel. There's no action required from the il-client (=the program at hand).

I am not familiar with the github. And I also do not know the difference between a fork and a branch. What would this be? And where in the github could I upload my (dirty) code? Or is this a new program to be put under a separate github handle?

This means that a live transcoded stream from my DVB-T USB dongle would be feasible!

I will now try and see how I can get the audio transposed upon the video (in sync) and howto pack it in a container. And there is the cleaning up after the transcoding bit to program (no cleaning up now, hey.... I am a guy)

Beach wrote:I am not familiar with the github. And I also do not know the difference between a fork and a branch. What would this be? And where in the github could I upload my (dirty) code? Or is this a new program to be put under a separate github handle?

and click fork. That will give you a forked copy of the original tree. Check in your changes, and now people can access your version of the code.

What would normally happen now, is you select your commits and click "Pull Request", and the original repo gets the option to accept them and your code is added to their repo.

From the original post, it sounds like the initial github repo may not have any further development done on it. If that is the case, then issuing pull requests may not be needed, as your tree may become the dominant one (which others may use to fork from and possibly contribute to). But start with issuing the pull request, and as long as they are responsive, submit any changes that seem useful.

I don't know if anyone's still following this, but FWIW my code now has resizing support, and can optionally display the decoder's output during operation. There's rudimentary muxer code present (which sometimes seems to produce something), but there are big troubles with timestamps still on every stream I currently have available, and I'm not sure where I'm going wrong.