I am currently testing data transfer with FX3, from GPIF2 to USB 3.0. I took the project GpifToUsb as basis. Since I would like to have a clock of 32 MHz, I reduced the PCLK clock (clkdiv=12 instead of 4) and I work with Streamer C# (modified version to do datalogging).

First, I noticed that 4 buffers of 32kB do not work properly. The throughput is very low. To solve this problem, I reduced the buffers to 4x8kB and I got almost the right throughput (32bits x 32MHz -> 128 MB/s, I got about 120-125).

So, my first question is: why does the system does not work well with large buffers and work well with small buffers? It is really weird because I think it should be the opposite...

Then, tried to connect a 8 bits counter on 8 GPIF pins to see if I lost data. Because of latency and long wire (I am using a bread board), I had to put the PCLK to 16 MHz. Then, to have a throughput of about 64 MB/s, I had to select buffers of 4x2kB becasue 4x8kB gave me a very low throughput...

So, with the original firmware, I lose 28 words of 32 bits every time a buffer is full. This seems to be normal according to the fact we have only 1 thread/socket and there is a latence while switching buffer (as explained in AN75779).

So I made some modifications in order to use 2 sockets/threads to tranfert data from GPIF2 to FX3 memory. Here is the code using CyU3PDmaMultiChannelConfig_t dmaMultiCfg; (instead of CyU3PDmaChannelConfig_t dmaCfg;)

As you can see, I basically modified the code the "DmaMulti" objects instead of "Dma". For the state machine, I did some adaptions. There is 2 threads and the 1st thread fill in the data until the !DMA_RDY_TH0 flag is set and switch to the 2nd thread until the !DMA_RDY_TH1 flag is set. Please see the attached file.

With this configuration, I have better results but not perfect. Instead of loosing 28 words of 32 bits while the GPIFII switch the buffer, I lose only 1 word of 32 bits.

So I have 2 issues:

- Why do I have to decrease buffer size in order to have correct transfert?

I also encounter the same problem, lost data! My FPGA data rate is 44MBps(Byte), first I store the data into 2K depth FIFO(32bitswith), then I read the fifo by clock 90M to GPIF(32bits)and use the usb3014 dma thread to send data to usb bus. finally I tried to use pBulkEpIn->XferData(pBulkBuf, nBulkLen) to catch usb data, nBulkLen is 4*1024, when I use other nBulkLen , Ican't get data. The result is that I can only get almost 30MBps on PC

Glad to know that I am not the only one who encounter issues! With certain buffer sizes, I can get almost all my data but I lost 1 sample every time the DMA has to change buffer despite I use 2 DMA threads as suggested in the UVC example!

I am back from vacations and I will try to find why! If anyone has a suggestion about my first problem (lost of 1 sample) or my second problem (same as Cngaoxf_2607281), it would be nice!

I implemented the error control for PIB that I found in AN65974 p20/68 and I got CYU3P_PIB_ERR_THR0_WR_OVERRUN and CYU3P_PIB_ERR_THR1_WR_OVERRUN or only one of them if I use only 1 thread.

How can I solve this problem?

I saw that I have to set DMA descriptors in the firmware, but I don't find any example even in the "cyfxbulklpautomanytoone" example... would it be the problem of loosing always 1 sample when I switch the DMA buffer?

EDIT: Ok, according to what I read, DMA descriptors are set when the DMA channel is created and the complexity is not seen by the user. Is it right?

Since I use the DMA_RDY_TH0/1 as flag to go to the next state machine state, I suspect a latency for the flag to be set. Then, the produced tries to fill a full buffer (that why I get a WRITE_OVERRUN error and I lost 1 sample) and the next buffer starts to be filled 1 clock to late.

About the throughput that goes very low when I choose a large DMA buffer, I saw on my oscilloscope that the flag DMA_RDY_TH0/1 is sometimes at low state with some 'peak' at high state (which mean the buffer is filled and then goes to high state because it as to be emptied by the consumer) which is fine... but after a few transfers, the same flag is stuck at high state which mean the buffer cannot be filled and that is why I got such a low throughput... I guess I need to fix it first before going further...

1. As you know that throughout depends on various factors as said in AN86947 (http://www.cypress.com/documentation/application-notes/an86947-optimizing-usb-30-throughput-ez-usb-fx3) You told that the throughput is increased by reducing the buffer size. What is the burst size, packets per xfer and xfer per queue you have set with 4X32 KB Dma buffer size at 32 MHz PCLK? Have you maintained the same when you switched to 4X8 KB buffer size? With 4X32 KB buffer, have you checked the throughput by varying packets per xfer and xfer per queue values? If yes, is there any improvement. 2. Yes, you have found the answer for data loss of 1 word (32 bits)

1. Lower Throughput with higher buffer size: - this is your first issue in this thread

I reproduced your test and can see the same results as you.

With the reduced the Clock frequency, 32 MHz, 32 KB buffer is taking time to full, till that time USB is waiting for the buffer.

If you decrease the buffer size to 8K and increase buffer count to 16 to maintain the same buffer space,You can achieve around 123 MB/s throughput.

Here I used one to one channel only.

dmaCfg.size = 8192;// CY_FX_DMA_BUF_SIZE;

dmaCfg.count = 16;//_FX_DMA_BUF_COUNT;

2. loosing one sample when there is buffer switch and getting OVERRUN Error:

This may be due to socket switching. Is it possible to put a counter on external processor, so that it will stop driving data when the counter hit and wait for DMA FLag to dissented? This will avoid the data loss.

3. Issue 4x32 KB Buffer size:

Since you are using Many to one DMA Channel, and configured buffer size as 4X32 KB for each channel(1. Prod_0 to Cons_0 2. Prod_1 to Cons_0). Effectively, It needs 4X32KBX2 = 256 KB.

By default, 512 KB RAM Chips has only 224 KB allocated for buffer area. You can see this in cyfx.c file.

I'll investigate in details your last message. But what I want to say is that I cannot loose any data (not even 1 bit) because I will use the FX3 to send raw data over USB and they will be processed by a computer or single-board-computer. So if I set a throughout of 128Mbits/s, I do need to get these amount of data, not only 123 Mbits/s.

So, since my last message, did Cypress make a example to transfer continuous data without any loss on 2 threads and without external signals like the example with the image sensor (AN 75779 if I remember correctly)?

Good news! I managed to get the correct throughout! with 16 MHz 32 bits, I got EXACTLY 64 MB/s ! I solved my issue by using a counter in the state machine to switch the buffer instead of using the DMA_RDY_THx flags that are set 1 clock cycle too late! I checked the data I acquired and there is no missing data anymore. Ok, using the state machine counter is not the most elegant way to do but at least it works!

Now, I will try to interface a quad 24 bits delta sigma A/D converter and see if I will manage to control it only with the FX3 state machine or it I will add a CPLD as master! I'm so happy!!!