The program runs forever and after each iteration an action is taken depending on RNG: try decode one of the above files or reset de decoder. After a lot of iterations -hours, even days- the program stucks at reseting decoder right in this line:

Is something wrong in the code? At first I though the program stuck when reseting the decoder (destroy+init) after trying to load a jpeg file with incomplete data but it also happens when reseting after decoding a good jpeg file. Thanks.

Give me everything I need to run your test and I can take a look. Seeing as there is no main() in that code I can't run it without making assumptions about how you are setting things up.
It sounds like a memory leak, but the components have had a fairly good hammering over the years to fix those. Hopefully any issues would be noticeable after a few iterations, and you're just seeing the final stages where resources are totally drained.

Give me everything I need to run your test and I can take a look. Seeing as there is no main() in that code I can't run it without making assumptions about how you are setting things up.
It sounds like a memory leak, but the components have had a fairly good hammering over the years to fix those. Hopefully any issues would be noticeable after a few iterations, and you're just seeing the final stages where resources are totally drained.

I don't want to hurry up but did you find anything? I'm running the test app in 3 raspberies (2x rpi 3B+, 1x rpi 2B) and the test got stuck at different times in all of them.

Give me everything I need to run your test and I can take a look. Seeing as there is no main() in that code I can't run it without making assumptions about how you are setting things up.
It sounds like a memory leak, but the components have had a fairly good hammering over the years to fix those. Hopefully any issues would be noticeable after a few iterations, and you're just seeing the final stages where resources are totally drained.

I don't want to hurry up but did you find anything? I'm running the test app in 3 raspberies (2x rpi 3B+, 1x rpi 2B) and the test got stuck at different times in all of them.

I've wasted time trying to get your application to compile.
You haven't provided a makefile or full gcc line, therefore I've wasted 15mins time trying to sort includes etc.

You didn't even provide instructions for where the data had to go, therefore I've been wasting time on that and had to reverse engineer your code to find out that they need to go in a folder called data. Without that I just get

That means you are referencing a buffer that you no longer "own". nFilledLen and nFlags will probably have immediately been zeroed, and the contents of the buffer may have changed.
If you call FillThisBuffer you MUST wait for the FillBufferDone callback, or the equivalent via ilclient (ie being able to retrieve the buffer via ilclient_get_output_buffer). Once you have processed the output buffer, then call OMX_FillThisBuffer with it.

I don't see any obvious memory leaks. I'll leave it running overnight with the debugger to see if that throws anything up.

Give me everything I need to run your test and I can take a look. Seeing as there is no main() in that code I can't run it without making assumptions about how you are setting things up.
It sounds like a memory leak, but the components have had a fairly good hammering over the years to fix those. Hopefully any issues would be noticeable after a few iterations, and you're just seeing the final stages where resources are totally drained.

I don't want to hurry up but did you find anything? I'm running the test app in 3 raspberies (2x rpi 3B+, 1x rpi 2B) and the test got stuck at different times in all of them.

I've wasted time trying to get your application to compile.
You haven't provided a makefile or full gcc line, therefore I've wasted 15mins time trying to sort includes etc.

You didn't even provide instructions for where the data had to go, therefore I've been wasting time on that and had to reverse engineer your code to find out that they need to go in a folder called data. Without that I just get

That means you are referencing a buffer that you no longer "own". nFilledLen and nFlags will probably have immediately been zeroed, and the contents of the buffer may have changed.
If you call FillThisBuffer you MUST wait for the FillBufferDone callback, or the equivalent via ilclient (ie being able to retrieve the buffer via ilclient_get_output_buffer). Once you have processed the output buffer, then call OMX_FillThisBuffer with it.

I don't see any obvious memory leaks. I'll leave it running overnight with the debugger to see if that throws anything up.

Sorry for wasting your time. I should have included the full gcc line but that didn't came to my mind, I'm really sorry.

I know that OMX_FillThisBuffer doesn't block that's why I check for errors and wait until EOS before reading the supplied buffer. So I assumed the workflow was:
1. Request an empty buffer to the component (ilclient_get_output_buffer)
2. Tell the component to write on that buffer (OMX_FillThisBuffer)
3. Wait for the component to process data and write to the provided buffer if no errors where found (ilclient_wait_for_event(deco->comp_resz, OMX_EventBufferFlag, deco->out_resz, 0, OMX_BUFFERFLAG_EOS, 0, ILCLIENT_EVENT_ERROR | ILCLIENT_CONFIG_CHANGED | ILCLIENT_BUFFER_FLAG_EOS, 500))
4. Now we got the reliable data in the buffer, copy it to our memory and work with it. (dst.pixels = std::vector<uint8_t>(buffer->pBuffer, buffer->pBuffer + dst.stride * dst.height))

But from your response I understand that OMX_FillThisBuffer must be called after processing the buffer to recicle it. Is this right and was I using wrong logic?

That means you are referencing a buffer that you no longer "own". nFilledLen and nFlags will probably have immediately been zeroed, and the contents of the buffer may have changed.
If you call FillThisBuffer you MUST wait for the FillBufferDone callback, or the equivalent via ilclient (ie being able to retrieve the buffer via ilclient_get_output_buffer). Once you have processed the output buffer, then call OMX_FillThisBuffer with it.

I don't see any obvious memory leaks. I'll leave it running overnight with the debugger to see if that throws anything up.

Sorry for wasting your time. I should have included the full gcc line but that didn't came to my mind, I'm really sorry.

I know that OMX_FillThisBuffer doesn't block that's why I check for errors and wait until EOS before reading the supplied buffer. So I assumed the workflow was:
1. Request an empty buffer to the component (ilclient_get_output_buffer)
2. Tell the component to write on that buffer (OMX_FillThisBuffer)
3. Wait for the component to process data and write to the provided buffer if no errors where found (ilclient_wait_for_event(deco->comp_resz, OMX_EventBufferFlag, deco->out_resz, 0, OMX_BUFFERFLAG_EOS, 0, ILCLIENT_EVENT_ERROR | ILCLIENT_CONFIG_CHANGED | ILCLIENT_BUFFER_FLAG_EOS, 500))
4. Now we got the reliable data in the buffer, copy it to our memory and work with it. (dst.pixels = std::vector<uint8_t>(buffer->pBuffer, buffer->pBuffer + dst.stride * dst.height))

But from your response I understand that OMX_FillThisBuffer must be called after processing the buffer to recicle it. Is this right and was I using wrong logic?

Certainly at the lowest level EOS is transferred via a buffer with nFlags including OMX_BUFFERFLAG_EOS. The core picks up on that flag and signals an event on it, but should also deliver that buffer. Some components send the EOS on the last filled buffer, others as an empty buffer with the flag set.

Should you have more than one buffer on the port then you are going to get very odd behaviour.

Steps 1 & 2 are correct. As you get the buffers delivered to your app there is little point in point in waiting on the EOS event, just look at the nFlags field of the buffers that are returned. The EOS callback is more for audio_render and video_render where they consume data and need to notify the client at some point that it has finished the stream.
Steps 3 & 4 are risky.

I've had a quick hack around with your code to handle it as I'd expect it - see https://github.com/6by9/omx_imgdecode_resize
I hate IL so it's not the cleanest, but it demonstrates the point. As noted, you really want a semaphore or completion that is waited on (instead of a 10ms sleep) which is set in at least errorcb, fillbuffercb and eoscb.

Certainly at the lowest level EOS is transferred via a buffer with nFlags including OMX_BUFFERFLAG_EOS. The core picks up on that flag and signals an event on it, but should also deliver that buffer. Some components send the EOS on the last filled buffer, others as an empty buffer with the flag set.

Should you have more than one buffer on the port then you are going to get very odd behaviour.

Steps 1 & 2 are correct. As you get the buffers delivered to your app there is little point in point in waiting on the EOS event, just look at the nFlags field of the buffers that are returned. The EOS callback is more for audio_render and video_render where they consume data and need to notify the client at some point that it has finished the stream.
Steps 3 & 4 are risky.

I've had a quick hack around with your code to handle it as I'd expect it - see https://github.com/6by9/omx_imgdecode_resize
I hate IL so it's not the cleanest, but it demonstrates the point. As noted, you really want a semaphore or completion that is waited on (instead of a 10ms sleep) which is set in at least errorcb, fillbuffercb and eoscb.

Thank you for your patience. I'm running the test with your modifications to see how it works in the long run.

When you talk about a sempahore is needed I asume that its initial value will be the number of output buffers available in the resizer, but as long as there's state information involved like errors, isn't a conditional variable and a status flag a more accurate approach?

When you talk about a sempahore is needed I asume that its initial value will be the number of output buffers available in the resizer, but as long as there's state information involved like errors, isn't a conditional variable and a status flag a more accurate approach?

TBH I'm not fussed over exactly which construct is used.
You want to sleep in the while loop until something happens that needs acting on, and the callbacks want to trigger the thread to wake up whenever any of them occurs.

so that all buffers get handled and passed to the component in the first pass.
I've added a commit that does that. (I've also checked that it works should you have more than one buffer on the resizer output port).

2) You don't want a counting semaphore as it'll be triggered by more than just the the buffer callback so the counting will go wrong. Fixing 1 avoids the issue I think you were thinking of.
We want to wake up on event, process everything that is available, and then sleep again.
VCOS has vcos_event_flags_[create | set | get | delete ] which offer the correct construct of waiting on something to change, and atomically clear the mask of what has happened as well as waking up the thread. The Linux kernel completion will do the same sort of thing without the bitmask of events (which aren't needed in this case). I don't think a condition variable and status flag will, but they aren't constructs I tend to use.
I've pushed a change with that too.

TBH I'm not fussed over exactly which construct is used.
You want to sleep in the while loop until something happens that needs acting on, and the callbacks want to trigger the thread to wake up whenever any of them occurs.

so that all buffers get handled and passed to the component in the first pass.
I've added a commit that does that. (I've also checked that it works should you have more than one buffer on the resizer output port).

2) You don't want a counting semaphore as it'll be triggered by more than just the the buffer callback so the counting will go wrong. Fixing 1 avoids the issue I think you were thinking of.
We want to wake up on event, process everything that is available, and then sleep again.
VCOS has vcos_event_flags_[create | set | get | delete ] which offer the correct construct of waiting on something to change, and atomically clear the mask of what has happened as well as waking up the thread. The Linux kernel completion will do the same sort of thing without the bitmask of events (which aren't needed in this case). I don't think a condition variable and status flag will, but they aren't constructs I tend to use.
I've pushed a change with that too.

That was exactly my question. I'm debuging the changes right now. Thank you a lot for your support, 10/10.

It did stall for me overnight. The GPU was waiting for buffers to be allocated before it could complete the port enable call.

Having looked at the code you appear to set up the tunnel in odd places. Why create and enable the tunnel before you've had the OMX_EventPortSettingsChanged from the decoder output?
There's also no reason to wait for a OMX_EventPortSettingsChanged on the resizer output port as you explicitly set that resolution.
IL is always a touch fussy about when/why/how things occur, so I've hacked away further at your code to use ilclient_setup_tunnel instead. That was the "approved" way of driving IL, so always safest.

I'm leaving it running over lunch to see what happens.

Sorry, really daft question. What is resize doing for you? I can't see a resize operation occuring, and image_decode natively supports OMX_COLOR_Format32bitARGB8888, so you may as well drop the resize totally.

It did stall for me overnight. The GPU was waiting for buffers to be allocated before it could complete the port enable call.

Having looked at the code you appear to set up the tunnel in odd places. Why create and enable the tunnel before you've had the OMX_EventPortSettingsChanged from the decoder output?
There's also no reason to wait for a OMX_EventPortSettingsChanged on the resizer output port as you explicitly set that resolution.
IL is always a touch fussy about when/why/how things occur, so I've hacked away further at your code to use ilclient_setup_tunnel instead. That was the "approved" way of driving IL, so always safest.

I'm leaving it running over lunch to see what happens.

Sorry, really daft question. What is resize doing for you? I can't see a resize operation occuring, and image_decode natively supports OMX_COLOR_Format32bitARGB8888, so you may as well drop the resize totally.

Yes that's the problem that I'm also facing.

In my first approach I used to create and enable the tunnel after receiving OMX_EventPortSettingsChanged in the decoder output but as long I had those strange hangouts I thought that building the tunnel before would grant more stability to the components despite of the port definitions could be unreliable at that moment.
You're right, but when I started this project I used to call apply_port_settings not only as an event triggered from OMX_EventPortSettingsChanged and that zero seconds wait event would remove the event from the ilclient event manager anyway.
Well at first I tryed to use the ilcient_setup_tunnel function but I had problems with the query to OMX_IndexParamNumAvailableStreams due to setting up the tunnel before any EventPortSettingsChanged received. So I examined the ilclient source code and addressed the issue with my limited resources. Also I understood that that query was related to media with multiple audio/video streams witch was not the case.

Well... the resizer was there to change colorspace. I started the project in rpi 2B and image_decode did not support OMX_COLOR_Format32bitARGB8888 in the past. I'm going to check this again but I did some -failed- tests some time ago with any RGB format available.

It did stall for me overnight. The GPU was waiting for buffers to be allocated before it could complete the port enable call.

Having looked at the code you appear to set up the tunnel in odd places. Why create and enable the tunnel before you've had the OMX_EventPortSettingsChanged from the decoder output?
There's also no reason to wait for a OMX_EventPortSettingsChanged on the resizer output port as you explicitly set that resolution.
IL is always a touch fussy about when/why/how things occur, so I've hacked away further at your code to use ilclient_setup_tunnel instead. That was the "approved" way of driving IL, so always safest.

I'm leaving it running over lunch to see what happens.

Sorry, really daft question. What is resize doing for you? I can't see a resize operation occuring, and image_decode natively supports OMX_COLOR_Format32bitARGB8888, so you may as well drop the resize totally.

Yes that's the problem that I'm also facing.

The updated version ran for 2 hours from just before lunch. I'll kick it off again tonight.
I've pushed my hacked version - I commented out large sections instead of removing, so that could be cleaned up.

marranxo wrote:In my first approach I used to create and enable the tunnel after receiving OMX_EventPortSettingsChanged in the decoder output but as long I had those strange hangouts I thought that building the tunnel before would grant more stability to the components despite of the port definitions could be unreliable at that moment.
You're right, but when I started this project I used to call apply_port_settings not only as an event triggered from OMX_EventPortSettingsChanged and that zero seconds wait event would remove the event from the ilclient event manager anyway.
Well at first I tryed to use the ilcient_setup_tunnel function but I had problems with the query to OMX_IndexParamNumAvailableStreams due to setting up the tunnel before any EventPortSettingsChanged received. So I examined the ilclient source code and addressed the issue with my limited resources. Also I understood that that query was related to media with multiple audio/video streams witch was not the case.

JPEG can support two streams - the main image and the thumbnail.

marranxo wrote:Well... the resizer was there to change colorspace. I started the project in rpi 2B and image_decode did not support OMX_COLOR_Format32bitARGB8888 in the past. I'm going to check this again but I did some -failed- tests some time ago with any RGB format available.

Hang on, my bad. If the codec spits out RGB (eg GIF, BMP, etc) then OMX_COLOR_Format32bitARGB8888 is valid. No, the component doesn't do format conversion, and the codec always decodes directly into the provided buffer.
video_splitter also supports format conversion and won't go through a resize path at the same time.

Thank you a lot for all your time and effort. I see that you've simplified most of my code.

About your changes:
1. Is it healthy to change port settings even if PortSettingsChangedEvent was not received?
2. I used to call ilclient_change_component_state to make state transitions of tunneled components but that can hang the tunnel because that call is OMX_SendCommand + ilclient_wait_for_event/ilclient_wait_for_command_complete (witch blocks before sending the OMX_SendCommand to the next tunneled component). A state transition must first OMX_SendCommand to all components and then wait for all the state change events.
3. You're creating a tunnel each iteration (apply_port_settings) but never tearing down them until destroy call. That seems strange to me.

Thank you a lot for all your time and effort. I see that you've simplified most of my code.

About your changes:
1. Is it healthy to change port settings even if PortSettingsChangedEvent was not received?

Potentially unnecesary, but no harm. I did that as if you decode the same image twice then the output format on the decoder doesn't change, therefore you get no PortSettingsChangedEvent but I want the tunnel set up.

marranxo wrote:2. I used to call ilclient_change_component_state to make state transitions of tunneled components but that can hang the tunnel because that call is OMX_SendCommand + ilclient_wait_for_event/ilclient_wait_for_command_complete (witch blocks before sending the OMX_SendCommand to the next tunneled component). A state transition must first OMX_SendCommand to all components and then wait for all the state change events.

It's been a fair while since I had to write IL code (it truly is horrid - use MMAL!). Generally for decode paths you'd

set up the first component input port (disable any outputs)

transition it to Executing

feed in the first few packets

wait for the PortSettingsChangedEvent

Set up the tunnel to the next component

enable the tunnel

transition the component to Executing.

Go back to 4 for the next component

Use ilclient_state_transition to change the state of lots of components at the same time.
Use ilclient_disable_tunnel or ilclient_teardown_tunnels to clear tunnels safely.

Mixing raw OMX calls with ilclient generally means doing something dubious. The exception was setting up clock tunnels which always got done manually for some reason (never worked out why).

marranxo wrote:3. You're creating a tunnel each iteration (apply_port_settings) but never tearing down them until destroy call. That seems strange to me.

Not intentional, but it seems to be working. The tunnel is always between the same two components and ports, so I probably get away with it.

The problem with MMAL is that there're not many examples available and I haven't found any documentation behind.

There's Doxygen markup in all the headers. Someone has compiled it and hosts it at http://www.jvcref.com/files/PI/document ... index.html
(I have asked if we can get it and the IL component docs hosted on raspberrypi.org but apparently it's difficult to host anything except Github style markup).
I did start trying to create MMAL equivalents of the hello_pi apps. https://github.com/6by9/userland/commits/hello_mmal has a video_decode and image_decode example, although there may need to be some further cleanup required (the memory is going these days). There are other more pressing tasks for today, but I will see if I can do a simple extension to hello_mmal_jpeg to include video_splitter as a format converter.

Sorry, I can't invest any more effort in looking at this. GetState is a simple call, so it's either a really subtle timing issue or a message getting dropped.
If you can narrow down the failure to make an easily reproducible use case then I may be able to find more time to take it further, but there is a desire to deprecate IL so significant investigation is wasted effort.

6by9 wrote:
There's Doxygen markup in all the headers. Someone has compiled it and hosts it at http://www.jvcref.com/files/PI/document ... index.html
(I have asked if we can get it and the IL component docs hosted on raspberrypi.org but apparently it's difficult to host anything except Github style markup).

That's some good documentation to start with. Thanks.

6by9 wrote:
I did start trying to create MMAL equivalents of the hello_pi apps. https://github.com/6by9/userland/commits/hello_mmal has a video_decode and image_decode example, although there may need to be some further cleanup required (the memory is going these days). There are other more pressing tasks for today, but I will see if I can do a simple extension to hello_mmal_jpeg to include video_splitter as a format converter.

That would be great because if you say that IL is going to get deprecated it would be good for me to migrate to MMAL. I need three abstracted components: jpeg decode->rgb (image_decode+resize/splitter), h264 encoder and jpeg encoder. H264/jpeg encoders look very easy to implement with the component_wrapper but I don't have any idea about connecting the jpeg decoder to the video splitter yet (still didn't read the documentation you provided me).

6by9 wrote:
Sorry, I can't invest any more effort in looking at this. GetState is a simple call, so it's either a really subtle timing issue or a message getting dropped.
If you can narrow down the failure to make an easily reproducible use case then I may be able to find more time to take it further, but there is a desire to deprecate IL so significant investigation is wasted effort.

I've been playing with your sample https://github.com/6by9/userland/blob/4 ... l_encode.c and it's working fine with common resolutions (320x240, 640x480, 1024x768, 1920x1080...) but it's not working with uncommon resolutions ie 80x60 (mmal_port_format_commit fails with EINVAL).

With the ilclient API I used to adjust the stride and the slice height to the next 16 multiple like this:

I'm testing with latest firmware plus a couple of bits and the port_format_commit succeeds at 80x60. However I'm also on the 4.19 kernel and appear to be seeing VCHIQ issues so some of the output files are corrupt. Sorry, investigating those takes priority (and also explains some of the other issues I've been having).

I'm testing with latest firmware plus a couple of bits and the port_format_commit succeeds at 80x60. However I'm also on the 4.19 kernel and appear to be seeing VCHIQ issues so some of the output files are corrupt. Sorry, investigating those takes priority (and also explains some of the other issues I've been having).

I'm on the 4.14 kernel version. If you find some information of interest about this issue please let me know.