Programming STM32F103 in Production, medium-high volume

As part of my bachelor thesis, i have to investigate the programming techniques offered by STM32 microcontrollers.
For R&D I would recommend the use of the JTAG port since debugging is also a big point. But what is the best "programming technique" for production? The volume is about 100-1000 microcontrollers a day that have to be programmed. Important for our production is flexibility (ISP is definitly prefered over the use of some external program adapters), easy handling, hardware cost and programming time.
Does anyone of you guys got experience about this case?
You would definitely help me a lot!
Thanks in advance
regards, Manuel

Use the USART1 with the built in boot-loader. Have a method to drive RESET, and BOOT0. Use a header, or test point array, or hardware/connectors already on the final product. Have a multi-headed serial port adapter, and use as high a baud rate as you can. You can use non RX/TX pins to signal/drive the process.

JTAG adapters are expensive, hard to program, and prone to damage. You can get stand-alone push button JTAG programmers, these are even more expensive. Often several times more that a scratch PC. We accumulate obsolete/junker computers as people upgrade, and then use them to build programming stations.

Are you looking to automate, program 1-up, 4-up, 10-up, etc? Use multiple stations? Do you understand the protocols well enough to write your own software and build a programming station?

Does the programmer also do final testing, calibration or serialization, or are these things more efficient for you to do as separate steps, or with test code within a boot loader.

Not referring to STM32 controllers especially, often are generally available debug adapters used for programming during production. Regardless of the interface (being it JTAG, SWD, SWIM, BDI or what-so-ever), testpads are reserved on the PCB, which are contacted with a needle bed after placement/soldering. Then, the programming application ist started, often as an external cmd. line application call from within another application. Usually, this is the first step in the production test of the device.

Thanks for your opinion.
So in your opinion the loss of programming speed by using the bootloader instead of the JTAG/SWD is not as dramatic as the hardware costs / extra hardware inventory etc. you need for JTAG/SWD?
Because the programming speed of SWD should be a lot faster than UART/CAN if you use a bootloader..
Yep, we are looking to integrate the programming into our end-test procedure of the product. Using multiple stations in order to make the programming parallel would also be a good idea, however, extra inventory is required (I have to check if this is possible)
I wont be the person who is responsible for the programming code and so on, it is my task to find a solution for programming and testing the µC/PCB in our production (well, to find the best possible solution of course :-) )

Because the programming speed of SWD should be a lot faster than UART/CAN if you use a bootloader..

Really, do you have anything to prove this assertion? What if I run my
USART at 460800 or 921600 baud, with a streaming protocol? I think the
flash would be the bottleneck, any improvement from SWD would be
fractional at best, not some order of magnitude. CAN would clearly blow
given the packet size.

For JTAG/SWD you'd need to buy a lot of hardware and software, or be
particularly skilled in the art. Cost per pod w/basic drivers, perhaps
$1K licensed, development costs in the order of $10K. Contrast that to
some USB serial ports, and cables that might run $20 a head, and
software dev time about a quarter, ie 1 week vs 4 weeks. Cost savings
there could be applied to replicating the test set up.

A single station could program multiple devices, would depend on how
good you are at scaling your solution. Multiple devices could be
programmed in parallel, a whole rack/array sequentially. Put a red/green
LED on the individual test fixtures.

For manual labour, you'd want a big enough fixture that the test
operator would be kept busy installing/removing devices as the tester
rolled from one row to the next. Bin the failing devices for a single up
retesting.

According to this paper :http://www.arm.com/products/system-ip/debug-trace/coresight-soc-components/serial-wire-debug.php
The programming speed of SWD should be up to 4MB/s
The maximum USART1 speed of STM32F103 should be 115.2KB/s (highest speed verified by the datasheed)
CAN programming speed is up to 1MB/s if i am correct.
I am still a student and i am for sure lacking experience that you have, therefore the informations given above may be incorrect or partly correct.
What do you mean with a streaming protocol?

So in your opinion the procedure with a Bootloader that allows programming via CAN is superior to JTAG/SWD?

As far as i am informed, the production team mounts a product then tests it. In the time of testing they already start to mount the next product. At the moment, testing (including programming) takes longer than mounting a whole product. But i guess that this problem will be solved in future..

Manuel, please look at some device data sheets and consider both the rate at which programming data can be transferred and also the time taken to program a memory location. For example, the STM32F407 may take up to 875ms to erase a 128KB sector and may take up to 100us to program each double word.

The problem isn't just how fast data can be transferred ... by whatever means.

Ive looked through the data sheet http://www.keil.com/dd/docs/datashts/st/stm32f10xxx.pdf
and the flash manual provided by ST.
I didnt find any informations about the time that is needed to program the Flash..
but i always thought that i first program a buffer and to program this buffer, only the data rate of the connection is important (is it CAN, UART or SWD/JTAG)
and then the processor writes the data from buffer to flash, and this is the point where i need to consider the programming time given by the processor..
Or am i wrong? Thanks for your help!

Hi John,
thanks for that information, ive found what you were refering to:
(see attachement)
What does this information offers me if i want to know the best possible programming procedure to use in production?
Is it correct that this time is independently of my data rate time that UART/CAN/JTAG/SWD offers? I thought that data first gets written into the buffer (with the data rate of the above mentioned) and then the processor writes it with the time mentioned in the table to the flash memory.
or am i wrong?
Thanks!

Attachments

It's device and implementation dependent. In some cases, a buffer may be available which is large enough to hold code to be programmed - generally, that's unlikely to be true for most devices. Some devices have separate banks of Flash memory and can execute from one while programming the other. You just have to read up on the device Data Sheets to find out more!

Your original post discussed programming and testing. Clearly, you cannot begin to test a device until after it's fully programmed - even if the program data had been buffered somewhere. This means that production throughput is dependent on the total erase (if required) and programming time of a device ... which depends on the code size you want to program and then on the test duration.

Clive may have more to say when the US wakes up later today. I don't think I can help further.

Hi John,
well, as far as im informed, our assembler tests our PCB without the flash being programmed via an in-circuit test and an optical inspection.
Ill check if the STM32F103 µC has a buffer that is large enough to store the program.
Thanks!

The maximum USART1 speed of STM32F103 should be 115.2KB/s (highest speed verified by the datasheed)

Um, no. The USART1 on a 72 MHz bus is pushing 4.5 Mbit, the loader runs initially at 24 MHz, which might permit 1.5 Mbit. Either way, I'd probably stage the flash loader, and push the bulk of the data with something like XMODEM-1K at higher rates. RM0008 should have a Table 192. Error calculation for programmed baud rates, with notes indicating the max rates, which should be 1/16 of the APB being used. These rates might not be viable, RS232 drivers typically filter at about 1 Mbit, but it's definitely a lot higher than 115.2 Kbit.

The erase time for a F4's 128K sector can take several seconds. At least according to the 405/407 Rev 2 manual. Perhaps 4 in particularly aberrant circumstances.

As Ron White observed about humans standing in hurricane force winds, it's not that the wind is blowing, it's what the wind is blowing.

What are the benefits of using XModem-1k for programming processors in your opinion?

If I was playing a raw speed game I'd probably pick an algorithm that streams better, but some of the advantages of XModem-1K go as follows :

a) Very robust, and self synchronizing.

b) Very simple, and small. ie I could write it from scratch and have it work in very short order. It's size makes it ideal for implementation in ROM, a lot of SoC designs use it. Sending hex files to a monitor is also very brief.

c) The 1K and CRC provide strong integrity, and a reasonable block size.

d) The ACK/NACK for each block provide a method of pacing the process, and retrying failures in both data transmission, and flash erase/write. Total failure could be signalled with CAN characters.

The implementation could be modified to use larger block sizes if the link is viable. Or could just accumulate larger buffers, ie write 8K buffers to flash holding the eighth ACK, but rapidly ACKing the prior 7 valid blocks. You might also be able to bury the erase time, and background some of the writes.

I'd recommend computing the CRC as each byte arrives, so the ACK can have very low latency.

Hey guys,
Thanks, for the information.
I guess we are also looking for such a stand-alone Programmer for firmware updates in field as an alternative to a firmware update with our bus-network, yet im unsure about the quality of those...?
Are there any datasheets available for these 2 "Handy-Programmers"?

@clive :
Ive thought about the programming cycle like this:
Our R&D team is currently working on an own bootloader that will work over CAN (not directly CAN, a new bus-network that will work over CAN but has some specifics).
This bootloader will be programmed at our assembler over JTAG or SWD (currently i am preferring SWD since its faster and it requires less ports --> less board space )
The flash of the µC will be programmed via CAN at our production.
What do you think about this idea?

Another issue that got in my mind is software hacking. Shouldnt it be possible for the customer to hack our software via JTAG or SWD?
If yes, how can we protect our products?
Thanks!

I've had great success programming with the cheap little ST-LINK
(ideally version 2) and the windows command line utility using SWD on a
functional test fixture that can run more than 30 units an hour. This
is for products built around the STM32F100 and STM32F105. Full flash
erase, program, lock cycle time in under 15 seconds. Good success with
the same hardware with Linux environment command line utilities....which
I think have free source available.

It wouldn't take much to make a little target powered SWD programmer,
but these are also readily available for about the same cost as a
quality bench meter.