VPU wrapper library suitable for 64 bit i.MX8?

I'm working on porting a video decoder application from the i.MX6 to the i.MX8 platform. However I think I ran into a problem. In the vpuwrapper.h header file some memory addresses in the API C structures are declared as type 'unsigned long' (not pointers to 'unsigned long' but unsigned longs to store memory addresses as a number).

When I run the application it can read the library version OK (which says ARM64 bit) but opening the decoder fails (error 2: invalid parameters). My assumption is that the header file may not be completely up-to-date and it uses the wrong types for the memory addresses.

I skimmed through the documentation and it seems I can push frames into the V4L buffer without needing to compress them first. The downside is that I'll need to rewrite the video player for this. I'm still kind of hoping someone chimes in and tell how the vpruwrapper can be used on a 64bit platform because that would save me time (and thus money).

I just tried the fsl-image-validation-imx-imx8mqevk demo image (on the IMX8 EVK board) but this also shows only /dev/video0 and /dev/video1 (/dev/v4l/by-path/platform-30a90000.csi1_bridge-video-index0 -> ../../video0 ). I'm starting to get the feeling that there is no output device for video4linux in the current kernel so I wonder having the decompression in v4l (if it is accessible through these devices) will do me any good. Unfortunately the documentation has all the different platforms mixed in one document and it is not clear what is supported on the i.MX8 and what not.

I'm using the i.MX8MQ (evk board). The article (interesting!) shows that gstreamer is using the glimagesink which is an OpenGL video back-end for Gstreamer. So it seems the decoded data must be drawn onto a GL canvas instead of a direct video output. I think I can get going from here. Many thanks for your quick replies.

I'll probably use the Hantro API directly or modify the VPU wrapper library and create a GTK+ application for Weston/Wayland to have easy access to an OpenGL canvas.

I have made some progress with the VPU wrapper. I have modified the VPU wrapper to work on 64 bit. It turns out the 'long' type length is equal to the native word width for ARM GCC so I used that for everything which deals with addresses. Another problem is that the original only decodes half the frames. There seems to be something wrong with how the decoding is handled; the state of the codec is queried when feeding a new frame which causes one out of two frames to be skipped because the function to feed the decoder exits early. One issue which remains is that the VPU wrapper says it doesn't have enough data in the buffer. The attached version seems to work OK for me though.

One other remark: I have not been able to get the example programs to work. The example fails to open the VPU wrapper succesfully.

Attachments

Next problem... it turns out reading the data from the decoder buffer is extremely slow. It takes about 28ms (yes: milli-seconds!) to transfer a 1280x1024 image into the CPU memory when I do a memcpy (no OpenGL involvement at all).

Any ideas on how to speed this up? I noticed that (when using VPU_DecGetMem to allocate result buffer memory) the Codec library uses a DWL memory allocator layer which in turn uses the Android ION memory managers which uses CMA memory buffers. I suspect the problem is somewhere in there.

More problems... it turns out the iMX8 Hantro decoder can only output YUV semi-planer formats (like setting the ChromaInterleave flag to 1 on the VPU wrapper DecOpen() parameters). I'm not happy about this. Where is the face-palm emoticon?

I hadn't noticed there was a new release out yet so thanks for bringing that to my attention. I have checked the changes in the VPU wrapper but the new version won't solve my problems (skipping JPEG frames, copying speed to OpenGL textures and semi-planar YUV).

The speed problem has to do with caching being disabled on the DMA buffers. To solve that I have created my own ION DMA framebuffer memory allocator which turns caching ON for the buffer. The ION driver also returns a file descriptor to the underlying DMA buffer so the physical address of the buffer can be obtained. If necessary it is also possible to control caching and cache flushing of the DMA buffer from user space so all in all the solution around the ION buffers is pretty neat.