Error using IPUV3 Library from Android

Hi, I'm trying to use IPUv3 block to do some scaling/cropping of the camera capture buffer. Platform is Android r13.4, board is MCIMX6Q-SDP board.

First I tried to use sp<PostProcessDeviceInterface> class (interface is in hardware/imx/mx6/libcamera/PostProcessDeviceInterface.cpp) to access IPUV3 Library but I get failure. I also tried to access imx_ipu device directly through ioctl(..., IPU_QUEUE_TASK, ...) interface but again it fails. I do see /dev/mxc_ipu device when I do ls -la /dev/mxc_* from the serial console. From within Android LogCat I get the following errors:

The second error is from me when I tried to do open("/dev/mxc_ipu", ...) from camera HAL.

I assumed that the open call ends up in mxc_ipu_open() in kernel_imx/drivers/mxc/ipu3/ipu_device.c so I instrumented this function but I don't see this log show up when open("/dev/mxc_ipu", ...) is made. I do see that ipu_device.c is built when the kernel is built.

At this point I'd like some guidance on how to get IPUv3 driver working from within Android so that I can try to use it.

Hi Xiaowen, I found in iMX6 SDK firmware guide that IPU has limitation of width=1024 output pixels because of FIFO line limitation. Can you confirm that? If this is indeed the case I'll not use IPU because my min output width is 1296.

Is it possible to use DMA from the user mode to accelerate s/w memcpy() in a similar way that IPU is programmed through imx_ipu device? Can you point me to some examples?

Hi Xiaowen, I got 3x scaling from 640x480 to (640*3)x480 working. I have one more memcpy into native surface for display which I tried to implement as 1x scale but it fails. It appears that IPU requires at least something different between source and destinqtion. So instead I scale 2x horizontally. After these changes it worked. The speed improvement over s/w scaling is unbeliveible (5 times).

The same 3x scale from 2592x1944 failed. Do you think it's alignment issue?

I couldn't make built in driver banding feature work but I found a workaround which looks pretty good to me (i.e. no visible artifacts). Idea is that work is divided into chunks that are no larger than 1024x1024 but instead of bands, buffer is divided into blocks. So for example, to scale rgb8888 432x968 to rgb8888 1296x968 (this is 3x scale horizontally), 1296 exceeds 1024 output limitation and 968 doesn't. To get correct number of passes, start from the output. 1296/2=648 which is within output limit and 968 can be left unchanged because it's within limit. Work can be done in 2 passes with the following code (Buf_input and Buf_temp are Android DMA_BUFFER pointers but they could be allocated differently, (please refer to rogeriorps/ipu-examples · GitHub for examples how to setup IPU, allocate and map memory in a generic Linux fashion):

struct ipu_task task;

memset(&task, 0, sizeof(task));

task.input.width = 216;

task.input.height = 968;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 968;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.input.paddr = Buf_input->phy_offset;

task.output.paddr = Buf_temp->phy_offset;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("scale x3: ioct(IPU_QUEUE_TASK) fail");

}

memset(&task, 0, sizeof(task));

task.input.width = 216;

task.input.height = 968;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 968;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.input.paddr = Buf_input->phy_offset+216*968*4;

task.output.paddr = Buf_temp->phy_offset+216*3*968*4;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("scale x3: ioct(IPU_QUEUE_TASK) fail");

}

In case when both horizontal and vertical size exceed 1024x1024 limit work can be split similarly but may need more than 2 passes. For example to scale 864x1944 to 2592x1944 the following code will work (requires 8 equal size passes and 9th pass for the leftover):

memset(&task, 0, sizeof(task));

task.input.width = 216;

task.input.height = 968;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 968;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.rotate = IPU_ROTATE_HORIZ_FLIP;

task.input.paddr = Buf_input->phy_offset;

task.output.paddr = Buf_temp->phy_offset;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("scale x3: ioct(IPU_QUEUE_TASK) fail");

}

memset(&task, 0, sizeof(task));

task.input.width = 216;

task.input.height = 968;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 968;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.rotate = IPU_ROTATE_HORIZ_FLIP;

task.input.paddr = Buf_input->phy_offset+216*968*4;

task.output.paddr = Buf_temp->phy_offset+216*3*968*4;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("scale x3: ioct(IPU_QUEUE_TASK) fail");

}

memset(&task, 0, sizeof(task));

task.input.width = 216;

task.input.height = 968;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 968;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.rotate = IPU_ROTATE_HORIZ_FLIP;

task.input.paddr = Buf_input->phy_offset+216*968*4*2;

task.output.paddr = Buf_temp->phy_offset+216*3*968*4*2;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("scale x3: ioct(IPU_QUEUE_TASK) fail");

}

memset(&task, 0, sizeof(task));

task.input.width = 216;

task.input.height = 968;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 968;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.rotate = IPU_ROTATE_HORIZ_FLIP;

task.input.paddr = Buf_input->phy_offset+216*968*4*3;

task.output.paddr = Buf_temp->phy_offset+216*3*968*4*3;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("scale x3: ioct(IPU_QUEUE_TASK) fail");

}

memset(&task, 0, sizeof(task));

task.input.width = 216;

task.input.height = 968;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 968;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.rotate = IPU_ROTATE_HORIZ_FLIP;

task.input.paddr = Buf_input->phy_offset+216*968*4*4;

task.output.paddr = Buf_temp->phy_offset+216*3*968*4*4;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("scale x3: ioct(IPU_QUEUE_TASK) fail");

}

memset(&task, 0, sizeof(task));

task.input.width = 216;

task.input.height = 968;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 968;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.rotate = IPU_ROTATE_HORIZ_FLIP;

task.input.paddr = Buf_input->phy_offset+216*968*4*5;

task.output.paddr = Buf_temp->phy_offset+216*3*968*4*5;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("scale x3: ioct(IPU_QUEUE_TASK) fail");

}

memset(&task, 0, sizeof(task));

task.input.width = 216;

task.input.height = 968;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 968;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.rotate = IPU_ROTATE_HORIZ_FLIP;

task.input.paddr = Buf_input->phy_offset+216*968*4*6;

task.output.paddr = Buf_temp->phy_offset+216*3*968*4*6;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("scale x3: ioct(IPU_QUEUE_TASK) fail");

}

memset(&task, 0, sizeof(task));

task.input.width = 216;

task.input.height = 968;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 968;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.rotate = IPU_ROTATE_HORIZ_FLIP;

task.input.paddr = Buf_input->phy_offset+216*968*4*7;

task.output.paddr = Buf_temp->phy_offset+216*3*968*4*7;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("scale x3: ioct(IPU_QUEUE_TASK) fail");

}

memset(&task, 0, sizeof(task));

task.input.width = 216;

task.input.height = 32;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 32;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.rotate = IPU_ROTATE_HORIZ_FLIP;

task.input.paddr = Buf_input->phy_offset+216*968*4*8;

task.output.paddr = Buf_temp->phy_offset+216*3*968*4*8;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("scale x3: ioct(IPU_QUEUE_TASK) fail");

}

As a side note, I also found a way to accelerate memcpy() with IPU block. 1:1 scale won't work, but if some change is specified (for example horizontal flip) it will work:

memset(&task, 0, sizeof(task));

task.input.width = 216*3;

task.input.height = 968;

task.input.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.width = 216*3;

task.output.height = 968;

task.output.format = v4l2_fourcc('R', 'G', 'B', '4');

task.output.rotate = IPU_ROTATE_HORIZ_FLIP;

task.input.paddr = Buf_temp->phy_offset;

task.output.paddr = Buf_input->phy_offset;

if (ioctl(mIpuFd, IPU_QUEUE_TASK, &task) < 0) {

CAMERA_LOG_ERR("ioct(IPU_QUEUE_TASK) fail");

}

To undo horizontal flip another such operation will be required, so with a temp buffer, fast memcpy can be achieved.

could be used as the input arguments to IPU, and in android HAL a temp buffer could be used as the output result from IPU, and after that, the temp buffer is gived to android uppler layer for their further operation, am I right?

Could you share your code for me to learn further, anyway, thank you for your posts, really helpful.