If you send say 80 bytes to the serial port (e.g. simply via an echo to /dev/ttyACM1), the sketch prints series of numbers like this : 0 0 0 63 0 0, which indicates that (at least) 17 bytes are lost.

It can easily be fixed in CDC.cpp: the code defines a ring buffer of size CDC_SERIAL_BUFFER_SIZE (==512). But the code that uses it uses the SERIAL_BUFFER_SIZE macro which is 64. So replacing SERIAL_BUFFER_SIZE with CDC_SERIAL_BUFFER_SIZE in CDC.cpp fixed the problem for me (I wanted to run ArduinoISP which requires receiving 128 byte messages from avrdude).

It will go wrong if you feed the due more than 511 bytes at once. Therefore it would be better to use USB's built in handshaking mechanism (make the device reply with NAK or NYET until it has room to store more incoming data). The same was done for the Leonardo. I could look into how to do that with the due's otg controller, but if someone already knows, that would save time...

It will go wrong if you feed the due more than 511 bytes at once. Therefore it would be better to use USB's built in handshaking mechanism (make the device reply with NAK or NYET until it has room to store more incoming data). The same was done for the Leonardo. I could look into how to do that with the due's otg controller, but if someone already knows, that would save time...

Thanks for the suggestion I've fixed the buffer macros. Can you point me out how the Leonardo implements the (NAK,NYET) handshaking?

The idea is to consume only as much data from the usb controller as you can store. As long as you don't have room for new data, the usb controller must be instructed to reply NACK or NYET to the host (it does that autonomously).

On the sam, the usb controller sends these NAKs/NYETs as long you don't "return" the fifo (or rather the memory bank(s) attached to it) to the the contoller. This is done by the udd_ack_fifocon(CDC_RX) function. There is an excellent diagram in the data sheet at pg. 1092 (40.5.2.13 Management of OUT Endpoints).

// if we should be storing the received character into the location // just before the tail (meaning that the head would advance to the // current location of the tail), we're about to overflow the buffer // and so we don't write the character or advance the head. while (i != buffer->tail) { uint32_t c; if (!USBD_Available(CDC_RX)) { udd_ack_fifocon(CDC_RX); break; } c = USBD_Recv(CDC_RX); // c = UDD_Recv8(CDC_RX & 0xF); buffer->buffer[buffer->head] = c; buffer->head = i;

i = (i + 1) % CDC_SERIAL_BUFFER_SIZE; }}In read(), after we consumed some data we must call accept again, thi might return the fifo to the controller:

// if we should be storing the received character into the location // just before the tail (meaning that the head would advance to the // current location of the tail), we're about to overflow the buffer // and so we don't write the character or advance the head. while (i != buffer->tail) { uint32_t c; if (!USBD_Available(CDC_RX)) { udd_ack_fifocon(CDC_RX); break; } c = USBD_Recv(CDC_RX); // c = UDD_Recv8(CDC_RX & 0xF); buffer->buffer[buffer->head] = c; buffer->head = i;

i = (i + 1) % CDC_SERIAL_BUFFER_SIZE; } r=0;

}

It's still theoretically possible for accept() to get called twice even with that guard in place although it hasn't happened to me yet in over 20 megabytes of testing. There is probably a better way to solve the problem. But I think we are getting somewhere

12.5.7.1A Load-Exclusive instructionUsed to read the value of a memory location, requesting exclusive access to that location.

12.5.7.2A Store-Exclusive instructionUsed to attempt to write to the same memory location, returning a status bit to a register. If thisbit is:0: it indicates that the thread or process gained exclusive access to the memory, and the writesucceeds,1: it indicates that the thread or process did not gain exclusive access to the memory, and nowrite is performed,The pairs of Load-Exclusive and Store-Exclusive instructions are:• the word instructions LDREX and STREX• the halfword instructions LDREXH and STREXH• the byte instructions LDREXB and STREXB.Software must use a Load-Exclusive instruction with the corresponding Store-Exclusiveinstruction.

// if we should be storing the received character into the location // just before the tail (meaning that the head would advance to the // current location of the tail), we're about to overflow the buffer // and so we don't write the character or advance the head. while (i != buffer->tail) { uint32_t c; if (!USBD_Available(CDC_RX)) { udd_ack_fifocon(CDC_RX); break; } c = USBD_Recv(CDC_RX); // c = UDD_Recv8(CDC_RX & 0xF); buffer->buffer[buffer->head] = c; buffer->head = i;

i = (i + 1) % CDC_SERIAL_BUFFER_SIZE; } r=0; }

(cmaglie - I just saw your code, looks good, I'll try it out)

It's not amazingly fast (I am getting a 7MByte file through in about 50s) but it is reliable - no diff in 50MBytes of testing so far

Woohoo! I just got back from work and noticed the problem is diagnosed, fixed, tested and committed in git-hub!Thanks guys.

Nevertheless I still feel we should think about the approach taken to avoid entering accept() from the isr. This is what happens without the guards:- bytes arrive for the 1st time from the host- the isr fills up the serial rx buffer, it can't free the fifo because it can only write 511 bytes and it has 512 bytes available.- the sketch performs SerialUSB.read() so it reads 1 byte.- so we end up in accept() which calls USBD_Recv(). This function automatically frees the fifo if we read the last byte from it! I think this is where the problem lies. This causes a new irq to arrive while we are still in our loop and did not yet update the head ptr of the rx buffer.

Therefore I tried with the function UDD_Recv8(): it does not automatically free the fifo, so we can finish our loop and call udd_ack_fifocon(CDC_RX) when we are done accessing the state in rx buffer.

I could not see the problem yesterday, but now, after seeing your fix, I can. Following code works without guards:

// if we should be storing the received character into the location // just before the tail (meaning that the head would advance to the // current location of the tail), we're about to overflow the buffer // and so we don't write the character or advance the head. while (i != buffer->tail) { // UDD_Recv8() does not automatically free fifo uint32_t c = UDD_Recv8(CDC_RX & 0xF); buffer->buffer[buffer->head] = c; buffer->head = i;

Safer would be to disable the rx interrupt during accept(). The various low level api's like USBD_Recv() already do something like that via the LockEP class. (though I don't like the idea to use a class for this)

Let us further think about this...

Quote

It's not amazingly fast (I am getting a 7MByte file through in about 50s) but it is reliable

Indeed, consuming these bytes one by one is really inefficient. But now we have something reliable, we can think about optimization strategies.B.t.w if we define CDC_SERIAL_BUFFER_SIZE as 513, would this be noticable? (then we have room to accept an entire fifo bank).

513-byte buffers might cause the compiler to calculate all those % operators by division. In any case I did try the code with 2048-byte buffers, but got data loss for some unknown reason. I'll have to look into that further (I can't think of any logical reason why it should happen). Shortening the buffer down to 4 bytes surprisingly didn't cause much of a performance hit.

We aren't being bottlenecked by write speed - I get nearly 10MByte/sec from the Due to the computer and transferred 800MBytes with perfect reliability.

Transferring more than one byte at a time is probably the best idea. I noticed there is a UDD_Recv(ep,*data,len) which can be used to get multiple bytes at once, and maybe we could have an overloaded SerialUSB::read(*data,len) too.

Update: PeterVH, I tried your fix without the guard but I can't get it to work reliably. It works for 100KB or so, then corruption, then nothing. Are there any other changes I have to make elsewhere?

I have been trying to debug and simplify the code, and also add in multi-byte reads. The (modified) loopback test now runs at 4.3 MB/s (that's about 45 million baud). My 888888888-byte test passes through in about 3 min 30 sec. I also added in some code to control the LEDs. I did this by removing the troublesome ringbuffer code and using UDD_ calls directly (My reasoning was, since there is 2 banks of USB fifo for buffering, why do we need more?)

I do seem to be getting occasional errors but only about once every several gigabytes, which makes it hard to debug as I don't know if it's a problem with the code or the cable or with the way I am testing it. Testing so far indicates it is a problem with transmission (SerialUSB.write())

Please test thoroughly! I am especially interested to see if removing the ringbuffer causes performance degradation in any situation - I haven't found anything yet which runs slower despite having less buffering. Also I am unable to test on Windows or Mac.

Albert: not really - programming port serial doesn't have a direct multi-byte read, it is too slow to need one. And an 888888888 byte test would take over 21 hours to run

If you need a programming port loopback sketch use MultiSerialMega and connect pins 18 and 19. As for the 16u2 firmware it is almost the same as the Uno firmware and I have tested that to death 2 years ago - it can be tested by uploading Blink and linking pins 0 and 1.