i.MX7D spidev

We have an issue with writing data through the SPI port using spidev that has been exposed to userspace by modifying the device tree.

We have some software that is writing two blocks of data out to an FPGA using SPI port 2 that is exposed to user space as spidev1.0. I can see that spidev1.0 exists in the /dev directory in embedded Linux as it should. The way this works is, initially, we have two blocks of data that is written out through the SPI port. Once this is done the FPGA will then toggle GPIO line 128 and that generates an interrupt. We have an interrupt handler in place that uses the embedded Linux poll() function to wait for the interrupt to occur and then send a block of data out to the FPGA. This happens on each interrupt.

When we run our software, and using a logic analyzer, we see one block of data is sent out to the FPGA using SPI port 3 as it should, but it takes a long period before the second block is written. Also, the interrupt handler detects an interrupt and sends out a block of data as it should, but then nothing else seems to happen, and occasionally the embedded Linus OS locks up and we have to power cycle our device.

Here is how we have the device tree setup to use SPI port 2 and SPI port 3 from user space:

SPI port 3's GPIO lines are already configured in the device tree that we received from Toradex so we just disabled: mcp258x0: mcp258x@0 and enabled: spidev1: spidev@1 as seen in the device tree ecspi3 device node above

We don't understand the above since these are error messages that are located in the spi-imx.c file. We can see in the imx7s.dtsi file that ecspi3 node's compatible property is set to: "fsl,imx7d-ecspi", "fsl,imx6sx-ecspi", "fsl,imx51-ecspi" which would indicate that SPI port 3 uses the spi dma device driver built using the spi-imx.c file, but the spidev1: spidev@1' node has a compatible' property of: "toradex,evalspi" and that is defined in the spidev.c file.

So, when accessing spidev from user space shouldn't that be accessing the non-dma device driver because of the compatibility setting?

If that is not the case then how would we use the non-dma SPI device driver since we are not configuring or using DMA to transfer data to our FPGA?

From what I'm seeing, and the errors that are generated, the issue is that the wrong DMA device driver is being used to transfer data, but maybe I'm not fully understanding what is going on and something else is the cause of our problem.

spidev is the generic user space interface to the SPI driver in the kernel. toradex,evalspi is used to activate spidev while the compatible ecspi property would activate the iMX SPI driver specific to iMX7. spidev will use the iMX SPI kernel driver which does use DMA. What exact version of Linux image are you using? What is the output of cat /etc/issue and uname -a?

Sorry for the late response. I've been busy working on other things, but I was finally able to create a small QtWidget test application in QtCreator which simulates how we're using the SPI port to write data to an FPGA whenever an FPGA interrupt occurs on GPIO line 63 as you can see in the code. The interrupt will occur on the GPIO line at a maximum rate of 200 microseconds. If we write a buffer that is 240 words, or 480 bytes in size, we can see that it takes a large amount of time, around 3 seconds, between SPI writes. We also see continuous error messages displayed from the serial terminal as stated above:

But, if we use a buffer size for the SPI write that is 198 words, or 396 bytes, the write works as it should, and we see interrupts occurring at a rate of around 13.5 milliseconds.
The longer time between interrupts is due to the fact that the FPGA FIFO is receiving the data it needs at the proper rate and does not need to make interrupt requests for data as often. I do occasionally see the error output above from the SPI driver only once on startup, but the data output from the SPI port seems to be correct.

To test the above you can change the value of the line:

static const uint32_t bufferSize = 240;

in the file: interrupt.h to 198 and 240.

I have attached the QtCreator project and the source files so that you can use them to run the test. If there is anything else you need please let me know.

@Gage05 It seems the problem you report has been fixed in mainline kernel atleast. A quick way to try would be to replace the spi-imx.c SPI driver with the one from here and check. Can you please also check this once at your end and confirm that this fixes the issue for you? I checked at my end using the test code you provided and the mainline kernel driver does not trigger the RX DMA error. In the meanwhile, I will be looking into which exact patches we need to backport to fix this. The issue is probably related to the DMA timeout being hardcoded in the downstream NXP kernel while the mainline kernel calculates this dynamically.

Well, it turns out that the issue with the current SPI device driver is a problem for us. With the current SPI device driver interrupts are occasionally missed and the output we see from the SPI port is not correct as a result. There are inconsistencies in the output.

I'm currently looking at the source files for the original SPI device driver code and the new code that you provided in spi_imx.c in an attempt to determine why this problem is occurring.

With the original SPI device driver code that we have everything seems to be working as it should, but occasionally it appears that the interrupt received is not serviced.

What is the CPU load when your application runs? At least with the mainline kernel driver I was not able to reproduce the issue with the test Qt application you had provided or spidev test code. Can you also try by disabling DMA by adding the below to ecspi3 node in your carrier board dts file?

dma-names = " ", " ";

Can you also check with 8 bit mode for transfers? The issue seems to be more prevalent in 16 bit mode.

Do you have any idea why this might be the case?

Sorry, but not at the moment. We will have to reproduce it reliably and investigate.

Thank you for these suggestions. I will try them and get back with you on the results.

The new SPI device driver code is better in one way. It allows us to choose buffer transfer sizes that, before, were causing issues. Now these buffer transfer sizes seem to work better and there is no longer a pause between writing out sections of the buffer over the SPI port. We still have the occasional unserviced interrupt signal that is an issue for us, but as I said I will try your suggestions and let you know how things turn out..

The unserviced interrupt is related to the frequency of the interrupts signals sent by the SPI device driver. At lower frequencies we never see the issue that I'm referring to. At higher frequencies, frequencies at which we will need the SPI port to operate and send data, we notice the problem.

This output seems to be indicating that the DMA is still being used in relation to the SPI port we're using.

In the &ecspi3 node a added the line that you provided: dma-names = " ", " "; to the main part of the node, and then we tried running the test by moving the line to the &ecspi3 sub node spidev1: spidev@1. I don't believe that the location should make a difference in this case since the sub node: spidev1: spidev@1 is what is enabled for use anyway.

If the line that you provided indeed disabled the use of DMA by the SPI port device driver then I would not have expected to see the DMA error above.

In looking at the SPI port device driver code I can see that the: I/O Error in DMA RX error seen is in the spi_imx_dma_transfer function in spi_imx.c at the point where wait_for_completion_timeout is called to wait for data to be received. The timeout is occurring and generating the output that we are seeing.

One thing about our tests is that if you are running the test code I gave you it will work fine and normally you will not see an issue. It is when you increase the frequency of the interrupts to the processor that the problems we're seeing, start to occur. I believe that you would see this problem on your end if you increased the frequency of the interrupts occurring on the GPIO line.

Also, what tool can we use to see the total load on the CPU that you're talking about?

The new SPI device driver code is better in one way. It allows us to choose buffer transfer sizes that, before, were causing issues. Now these buffer transfer sizes seem to work better and there is no longer a pause between writing out sections of the buffer over the SPI port.

Also, what tool can we use to see the total load on the CPU that you're talking about?

htop

I started looking into this issue now. I do not have a setup to reproduce the rx side of things you have, though the issue seems to be easily reproducible for tx on our/NXP downstream kernel. The DMA implementation seems to be not correct.