It gets much more interesting about 2.1 seconds after the first falling edge. This looks like it's probably the actual data transfer. Even then, check out the lengthy gaps. It's spending almost as much time waiting as it is actually moving bytes... and that's after it took its time (2.1 seconds) to even begin sending.

Obviously it's not even beginning data transfer for 2 seconds. Why, I have no idea.

My guess is those gaps are latency added mostly by the firmware in the 16u2 chip. I believe the upload protocol, which is created by Atmel and permanently burned into a non-upgradeable ROM inside the SAM3X chip, involves a command-response approach. That's never great, since there's always some latency (especially on Windows) from the USB frame times. But the 16u2 probably adds much more, because it probably waits for a timeout before sending any buffered serial data as a partial USB packet.

The protocol itself involves ascii encoding of data, and 115200 is only 11 kbytes/sec, so even without the dead times, the speed can't be great with serial.

Upload protocols and speeds are something of an obsession of mine. For Teensy 1.0 & 2.0, I used USB control transfers, but also still a 1-at-a-time approach requiring the control transfer's final ACK. That's reasonably fast, but still far from optimal. For Teensy 3.0, I tried a new approach that allows streaming with substantial buffering by the bootloader, and an ACK/NAK approach to allow the transmitter to sense the board's ability to accept more data. It turned out a lot of the limitation in speed was due to the operating system's latency in scheduling the userspace program to run. The streaming, buffering and ACK/NAK solves that problem and lets the upload happen at a speed paced only by the flash write timing (if you try a Teensy3, I think you'll be impressed how fast 100K uploads). Still, even with all those measures, detecting the request to upload, disconnecting from the USB and re-enumerating are slow, taking about 1 to 2 seconds. At least with Teensy they are, since the USB disconnects. I have a few ideas about speeding that up... but they're extremely difficult. Did I mention I tend to obsess about upload protocols and speeds?

Anyway, for Arduino Due, the upload speed could probably be improved considerably if someone put a lot of programming work into making the 16u2 chip more aware of the upload protocol. Actually, it probably has no need to be aware of the PC-to-Due stream... it's the Due-to-PC responses that it could recognize. If it could detect those and quickly transmit a partial USB packet, rather than sitting there waiting for more data, you'd probably see substantial boost in speed. It might also be possible to increase the baud rate? I don't know what baud rates Atmel's bootloader can support, but the 16u2 is theoretically capable of up to 2 Mbit/sec. In practice, 16 MHz AVR can do about 0.5 Mbit/sec with well written code in C while also polling the USB stuff.

But ultimately, some of the slowness is the non-optimal protocol Atmel designed, both the non-binary data format and the command-response nature. Unfortunately, there's nothing you can do about those issues, since the bootloader is permanently burned into a ROM on the chip that can never be upgraded.

Unfortunately these delays are all workarounds to fix some upload issues with SAMBA bootloader, and the bootloader itself is burned into the SAM3X ROM and cannot be changed in any way.

If you want a complete list of the patches to bossac you can give a look here:

https://github.com/shumatech/BOSSA/commits/arduino

Atmel actually use another trick to improve upload speed in their client: they use SAMBA to upload a small app into the SAM3X SRAM, and afterward run this app, that takes over the CPU and do the real flashing in an efficient way.

BTW its quite complex and needs some stack machinery to work. Arduino is not going to change the way code is uploaded soon.

When Massimo gave me one of the early Due betas (Maker Faire in May 2011), I remember playing with it that first week before the beta site opened. I didn't know about bossac. I didn't have ANY code from anyone. I only had Atmel's datasheet. So as a first experiment starting from nothing, I wrote this shell script to blink some LEDs:

// slowecho - like echo, but slowly, for certain bootloaders that can't// accept data quickly, even when running on a fast processor and// communicating with USB protocols that have end-to-end flow control.//// this code is in the public domain//// compile with:// gcc -Wall -O2 -o slowecho slowecho.c

If Atmel ever did fix it, maybe bossac could begin by detecting the bootloader version?

The idea of uploading an optimized bootloader to RAM is actually pretty crafty. If someone ever did go to all that trouble and added it all to a new version of bossac, wouldn't you consider accepting the contribution? Of course, I think the odds of anyone ever going to all that trouble are pretty slim. But maybe someone could convince Atmel to release their bootloader helper code under an open source license, so someone could use it as a starting point?

But really, I just obsess too much about data transfer speeds and protocols. I should probably stop now and get back to more urgent work.....