gcode multiline buffering, ack problem and proposal

Hi I am Sam. Back in 2009 I was working on EMCReprap (http://reprap.org/wiki/EMCRepRap) and I finally gave up and got myself an Arduino Mega + hand wrapped RAMPS with TB6560 driver, resulting in this [www.youtube.com]. However, I am quite unsatisfied with firmwares today, and the processing power of AVR - which limited the software architecture design and possibilities. And hence a started a new firmware project targetting Arduino Due (ARM), with ChibiOS (realtime OS). (https://github.com/sam0737/reprap-arduino-due). Though I still have a fulltime job to take care with, the progress has been shaky. Enough said...

TL;DR I am working on a new firmware, and there is one particular issue with the host interaction I see rooms for improvement.
When M116 (Wait for target temp) or M4 (Dwell) is issued, the host can't send additional line, because it has no idea if

the command is well received and just being processed,

if the firmware has buffer for more

or even worse - the newline is dropped, no "rs/resend" is being transmitted and everything just hanged.

And at this stage, even M105 (Temp report) and M112 (Estop) can't go through because host does not know if it is safe to send. (Firmware like Marlin won't be able to react because it is in the tight loop, but Teacup which use dda loop for temp check should be able to react - another reason why I am starting a new firmware project)

The firmware should return if buffers are fulled or not by implementing GCODE_buffer_multiline_proposal. (I will stick with Q:1 in the example for buffer not empty, I shall discuss the problem when multiple lines are transmitted below) (Instead of using the bareword buffer, we better stick with key:value format for easy parsing. I don't want a machine named "buffer" causing any trouble in a M115 command)

The firmware should send "ack N:*" for those commands well received but could not be immediately finished/buffered, like G4. N:* refers to the line number ack'ed

The firmware should send "ok N:*" when the commands are processed ("ack" could be skipped for those commands immediately processed/buffered). N:* refers to the line number ok'ed.

The firmware should be able to process some commands out of order with immediate effect - say M105 (Get temp), M114 (Get pos), M112 (Estop), M226 (Pause). Prehaps the list should be reported in M115 (capabilities query) as like "IMMEDIATE_COMMAND:M105,M114,M112,M226". These could also be used to get the status of the buffer.

The host should use line number for all gcode.

The host should not send any more lines except those with immediate effect when buffer is full. But it should keep sending M105 to get notified when the buffer is empty again.

The host should resend a line when no "ack/ok" is received in a reasonable time limit, like 500ms.

(Host repeat sending M105 and after so many lines, the dwell is finally done)

Firmware: ok Q:1 N:10 (Here the firmware green light the D4)

(And everything just goes on)

Host: N112 G1 X50 Y50 F5000

Firmware: ok Q:1 N:112

Sample interaction - command is not transferred correctly.

Host: N10 G1 X50 Y50 F5000

(No response from the firmware, host should retry)

Host: N10 G1 X50 Y50 F5000

Firmware: ok Q:1 N:10

You can imagine the ack/ok response also being dropped, the host just repeats the command. Sounds like TCP.

There are other situation when a firmware could returns ack instead of ok -
for example if the printing is manually paused from the LCD, like M226 but initiated from the LCD. The firmware want to hold the stream but not stopping it, and the user could resume the printing at anytime.

So this scheme solves (a), (b) and (c),
The nicest part? It still retains backward compatibility - old host would ignore ack response and wait for the ok response.

Multiple lines
Things is slightly complicated when it comes to multiple lines.

Multiple lines all transmitted ok, with a dwell

(Assume the last returns is ok Q:4, the host next three commands are - G1, G4 and G1. G1 will be immediately processed but G4 needs to wait)

(After a little while, says 500ms, host should retransmit starting from N11, because only N10 is ack'ed)

Host: N11 G4 S5000

Host: N12 G1 X100 Y100 F5000

(Firmware should ignore N11 because it is well received already, but it has to ack/ok accordingly)

Firmware: ack Q:2 N11 So far anything else just requires simple state - like a integer storing last received line number. This one might be non-trivial to implement - the firmware must lookup the buffer and see if N11 is processed (ok) or still waiting (ack)

Well I am going to answer my own question. If host doesn't heard "ok" for a command, it could just resend every second.
It's easy for the firmware to check if it is a duplicated command because of the line number. Then it could act accordingly -
* Firmware would reply with ack if the command is indeed still in progress, or
* resend the ok, if the command was finished.

I think that multiline proposal is complicated and anyway doesn't work because it still degenerates to the case where the buffer is full and the firmware can't accept a new command, so the host has to poll. The buffer size reporting adds complexity but doesn't add any value.

I have a much simpler proposal, which is that the firmware simply returns "BUSY" for any command it can't handle immediately and the buffer is full. If the host receives a BUSY, it must resend commands (it need not be the same command, so can interject other non-buffered commands).

Either way, the host has to poll, and the firmware has to do some multitasking, so it would be best to make that as simple as possible.

I agree on you that firmware must do multitasking/out-of-order handling to accept out of band command like M112 (EStop) /M105 (Check temp), no matter what solution it's going to be.

Secondly, as we the community is moving to faster CPU (Due) which has native hi speed USB, maybe the original argument of multiline buffering - speeding up the communication - no longer holds.

Maybe checksum and retry would be an waste too, because the native USB handles that already.

Originally I thought "ok" must be tagged with the line number, but I don't think it is necessary. As out-of-band commands are processed fast, host could always assume all "ok" are corresponds to those out-of-band commands first.

I am going to called the following commands as "out-of-band commands": M105 (Get temp), M114 (Get pos), M112 (Estop), etc. Other normal commands would be in-band commands.
It is not the same as non-buffered commands. non-buffered commands like G4 still have to be executed in-order.

Problem to solve

Transmission corruption should be able to heal in all cases which means

Host should know when a command is well received

Host should know when the firmware has buffer for more the new line

In case of no response and no "rs/resend" is being received, things should also be able to heal instead of hang

out-of-band commands should be honored at anytime

compatible with old host software, or new host software with old firmware.

Design ProposalLogic Flow - Host side

Host sends the Nth line.

If host receive "rs XXX" N=(XXX, which is the line to be resent), go to step 1. Under this scheme, XXX must be equals to N.

If all you want is to fully utilize the link bandwidth (not limited by round-trip time), and the link is reliable, it is fairly straightforward to solve. At the beginning, have the host query the printer for the total RX buffer size in bytes. We assume that the RX buffer of the printer can be modeled as a simple character based ring buffer, and that each command consumes exactly its length including the newline (but see note at the end).

Then the usual protocol is followed where every line is acknowledged with an 'ok', but the host is generally allowed to send further commands before receiving an 'ok' for previous commands, subject to the following:

The host keeps a FIFO buffer of the lengths of commands that have been sent to the printer but which have not yet been acknowledged. Whenever it sends a command, it pushes the length of that command to the back of the buffer, and whenever it receives an 'ok', it pops an entry from the front.

The host will never send a command to the printer if that would result in the sum of all command lengths in said FIFO to exceed the reported RX buffer size of the printer. If the host has commands but can't send them, it needs to wait for enough 'ok' before sending another command.

This guarantees that the firmware's RX buffer never overruns. Suppose P is the number of pending bytes (sum of command lengths in host's FIFO), RXsize is RX buffer size, and RXused is the current number of bytes in RX buffer (if RXused>RXsize that means there was an overrun). Then:

RXused <= P (the bytes in the printer's RX buffer are a subset of the pending bytes).

P <= RXsize (the limitation on the host's behavior).

Therefore: RXused <= RXsize (the RX buffer is not overrun).

Observe how this shouldn't require changes in the firmware other than reporting the RX buffer size.

Here's an example of how the communication would look like, assuming M951 is the RX buffer query command.

This will keep the link fully utilized as long as the RX buffer of the printer is large enough. How large one might ask. Suppose that:

You want to achieve the transmission speed S (bytes/second).

The time from when a command is processed by the firmware to when the host receives its 'ok' is at most Tack (seconds).

The maximum command length is L (bytes).

Without justification, I claim the RX buffer size needs to be at least RXneeded=L+Tack*S.

For example, if L=64B, S=25000B/s, Tack=0.01s, then RXneeded=314B.

Finally, some notes about RX buffer model:

In simple DMA-less implementations of UART in the firmware, it is easy to implement a byte-based ring buffer as assumed here.

On DMA based UART implementations, the buffer space may not be optimally used, so the simple model does not fit, due to e.g. memory alignment required by the DMA. In this case it may be useful for the host to not only limit the number of pending bytes, but also the number of pending commands.

When the communication link is USB CDC, the USB guarantees that no bytes will be lost if the printer is unable to accept them, even in the absence of such software based flow control algorithms. That is assuming the firmware is implemented right and starts refusing data with NAK responses instead of discarding it. But software flow control is still useful to make sure the link does not get saturated and can be used for other purposes than sending gcode. Most particularly this helps with the problem that some USB device implementations have a single RX buffer common to all receive endpoints, as opposed to the arrangement where each endpoint has its own RX buffer (and hence can do flow control independently).

On the other hand if we assume the link is not reliable, this will obviously break, and some more elaborate protocol needs to be used. I think the approach described above can be extended to provide some reliability. The host's FIFO could be made to hold actual commands that have been sent, not just their lengths, and these would serve for retransmissions, which would be done similarly to how TCP operates. But I think that before this can be done, a way to delimit commands needs to be devised which is reasonably resistant against transmission errors.

Commands could also be tagged with channel numbers, and the firmware could have multiple RX buffers, one for each channel. In particular there could be a data channel and a control channel, and this would ensure that control commands are responded to quickly even if the data channel buffer is full.

I agree on data and control channel separation. That's something I desperately needed - I can't stand to have M105/M112 not responding while waiting for temperature to reach.

Your scheme doesn't address data corruption.
What if the command sent was corrupted? The host does not know if the command is received and working (D4/M116 will take a long time), or the command is not received properly such that firmware can't response with an ok.
The further question is what if the firmware response is corrupted.

Quotesam0737
Your scheme doesn't address data corruption.
What if the command sent was corrupted? The host does not know if the command is received and working (D4/M116 will take a long time), or the command is not received properly such that firmware can't response with an ok.
The further question is what if the firmware response is corrupted.

Yes, I said that my basic protocol idea doesn't address that. It's just a way to get the max speed out of a as-long-as-you-dont-overrun-it-reliable link with no inherent flow control.

The problem is that if you want to do proper flow control (something like I describe) in combination with error detection and correction, the protocol can get quite complex, more so than if you only want to solve one problem (my solution which addresses flow control and yours which addresses transmission errors). I think it is a bad idea to try to be compatible with the existing gcode protocols, if a highly reliable and extensible protocol is the goal.

I have some ideas for the goals a properly designed protocol should achieve. The protocol should assume only a non-reliable serial link, which can be an UART or USB CDC (even if CDC is kind of reliable).

I believe it is the right approach to design the protocol in layers, where each layer relies on some properties of the lower layers.

The protocol should be symmetric with respect to both ends of communication, on all layers; sending data from host to printer should be identical to sending from printer to host. That being said, the protocol should consist of three protocols:

Lower-layer protocol.

Defines how each packet is encoded into a stream of bytes, and allows the receiver to reliably distinguish individual packets in the received stream. The lower layer protocol should have the following properties:

Whenever a packet is transmitted without errors it should be recognized as such by the receiving side. This is absolute, it doesn't matter how much and what kind of curruption was preceding that packet.

If a packet is transmitted with errors, the receiving side should detect the corruption with a high probability. That is, there should be a low probability that the receiver recognizes a corrupt packet as valid.

Middle-layer protocol.

Provides a stream-based interface on top of the lower layer protocol. It should have the following properties:

The protocol should support separate channels (like TCP ports).

Each channel should provide a reliable stream interface. Here, reliable means the middle-layer protocol can assume that each packet sent on the lower layer is either received correctly by the other side, or not at all, and that no packet reordering happens.

If one or more of the channels is saturated (the sender has data to send but the receiver doesn't want any data), any other channels should continue to operate normally.

The protocol should allow optimal utilization of the link speed (i.e. what my previous flow control suggestion provides).

Higher-layer protocol.

This runs on top of a channel of the middle-layer protocol. As inefficient as that would be, It can even be the existing defacto gcode protocol (with line numbers and checksums and ok), and if one implements the higher-layer protocol as a virtual serial port, it would work with unmodified host software. Or it can be simple newline separated gcode without any error checking. It could also be my packed g-code encoding.

Now I want some suggestions for protocols satisfying these requirements. I also have some ideas of my own but I'll let others speak first

I agree with layered approach. Or at least we should think in layers. Separated channels.

> Whenever a packet is transmitted without errors it should be recognized as such by the receiving side.
I don't agree. The either thing could get lost - Maybe the UART TTL line is simple disconnected, maybe the firmware hangs.
The host must be able to recognize this without confusing with lack of activity in high level means (like waiting for G4), maybe due to the lack of response from the firmware.

About compatibility with existing protocols: Less code changes means adoption would be easier and faster. Though it's not a must of the design.

I skim through your packed gcode, which is fine and great for the highest layer (So does repetier's binary protocol).
Yet there are rooms of improvement for lower and middle layer

Quotesam0737
> Whenever a packet is transmitted without errors it should be recognized as such by the receiving side.
I don't agree. The either thing could get lost - Maybe the UART TTL line is simple disconnected, maybe the firmware hangs.
The host must be able to recognize this without confusing with lack of activity in high level means (like waiting for G4), maybe due to the lack of response from the firmware.

You misunderstood that, and I worded it a bit wrong. I meant that when it is received without errors, it should be recognized as a valid packet. Contrast to the existing situation where a corrupted newline will lead to the next line being not understood as correct. Same goes for my other point about the lower layer protocol.

A simple way to ensure that is to reserve a special start of packet character that is guaranteed never to appear in the payload (e.g. by some sort of escaping). Possibly in combination with an end of packet character.

Quotesam0737
Yet there are rooms of improvement for lower and middle layer.

I haven't yet said exactly what the protocols for these should be, I just made some suggestions about what they need to achieve.

Another note, when one has a reliable stream abstraction, the "ok" responses no longer serve any purpose for flow control. But instead of eliminating them, we could say that the printer can, optionally, send the "ok" responses in real time as commands are physically executed by the printer (or it can just keep the existing semantic, send ok when it's buffered). This would give a very accurate print progress in the host software.

Right. Currently the "ok" has dual functions - flow control and notification of command completion. In reality these two events doesn't happen together, they don't depend on each other. This "ok" must be redesigned.

Developing an entire new firmware a last resort at best. There are so many details to get right you don't really want this. It would take years to get on par with existing firmwares.

A port of Teacup to ChibiOS exists already, done by bobc: [github.com] . I didn't pick it for general Teacup because especially in ATmega based CPUs, every single clock cycle is cruicial and Teacup already has sort of an compile-time abstraction layer - see arduino.h.

It shouldn't be too difficult to implement G4 similarly to M116. If G4s go into the movement queue, the firmware continues to respond. Typically, a not responding G4 doesn't matter much, though, temperature control continues to work anyways.

If you consider the serial link to be reliable, the most simple way to achieve maximum throughput is to use the already implemented XON/XOFF protocol. The "ok" response is redundant, then. Some people actually print by just sending the G-code file down the serial line, without a host.

Accepting non-movement commands like M105 while the queue is full can be handled on the host side only, because the firmware would have to trash a command when a movement command comes in while the queue is full. Without trashing, it would just delay the problem by one line of G-code.

I've put some thoughts into faster communications already. The result was having an "ok" along with a small bit of information about how many space is left in the movement queue. With an adjusted host, this would allow to send multiple commands at once and also to keep room for non-movement commands like M105: [reprap.org]