Thanks for the correction, I updated my post and striked through the faulty math.

Still I like to think of it in time/char as a frame/time is dependant on the size of the frame where time/char is not.

Quote

There is also yet another I2C level optimization possible for string output. Currently the Cosa LCD driver implements only IOStream::putchar() and handles puts() and write() as a sequence of putchar(). The function writes the character to the LCD but also handles form-feed, carriage-return-line-feed and a few other control characters (something that LiquidCrystal does not). It is possible to write numbers directly as the string will not contain any control characters. The only issue is text clipping or wrapping. This implies that the whole string could be translated to a single larger I2C block and written as one transaction. This removes the I2C addressing per digit character. This is the same as the nibble optimization only on the next transaction level.

Thanks for the correction, I updated my post and striked through the faulty math.

Still I like to think of it in time/char as a frame/time is dependant on the size of the frame where time/char is not.

That is why I wrote LCDiSpeed to report timing 3 different ways:- per byte, which is not dependent of frame size- per frame, which is dependent on frame size. (reported in both FPS and actual time)- per "iFrame" which is what the frame time is on a 16x2 display regardless of the actual size of the display in use.

I recently updated the Cosa I2C driver and did a refactoring of the TWI::Slave class for ATtiny. As a spin-off I created a Virtual LCD class that sends "commands" via TWI to an ATtiny84 running the LCD driver. This allows reducing the number of bytes transmitted even further. From the original 4 transmissions with 2 bytes (address and port value), to the optimization for the IO expander with a single 5 byte message (address and four port values) and now down to a single 2 byte message (address and character to print on the LCD).

Running the LCD driver on the ATtiny84 at 8 Mhz and 4-bit parallel mode gives a frame rate of 413. And running the I2C Slave Virtual LCD on the ATtiny84 gives approx. 72 fps. This includes the Cosa I2C driver ISR pushing an event and the dispatching of the event to the adapter. Current max with the I2C IO expander is 53 fps @ 100 khz. Another 35+ % improvement.

Further improvements are possible (when using an ATtiny as LCD slave) as the IOStream::Device functions puts() and write() can use single messages. Also number conversion could be moved to the slave by sending binary numbers instead of characters.

Cheers!

Below is the LCD/TWI slave sketch which is running on the ATtiny84. This is a simple command interpretor to handle the LCD operations. The design is event driven where the ISR pushes an event for incoming TWI requests. These end up in the implementation of the method on_request().

If you looking for a fast low pin count interface to an LCD (can't be lower than a single pin), you might be interested in this recent activity:https://bitbucket.org/fmalpartida/new-liquidcrystal/pull-request/1/adding-an-optimized-implementation-of/diff#comment-366944Although the interface uses a single pin, it can transfer bytes in 92us for a frame rate close to 320 FPS,which is about 3.6 times faster than the standard LiquidCrystal library using 6 pins!This is a great example of how inefficient the Arduino core routines like digitalWrite() are.It is about 6 times faster than the optimized i2c i/o expander interface.

While more components and a bit more complex than using something like a PCF8574 i/o expander chip,the total component cost should be lower given595s can be had for about (USD) 20cents and transistors are about 2-3 centsand caps and resistors are about 1 cent - all quantity 1 from places like tayda.

If you looking for a fast low pin count interface to an LCD (can't be lower than a single pin), you might be interested in this recent activity:https://bitbucket.org/fmalpartida/new-liquidcrystal/pull-request/1/adding-an-optimized-implementation-of/diff#comment-366944Although the interface uses a single pin, it can transfer bytes in 92us for a frame rate close to 320 FPS,which is about 3.6 times faster than the standard LiquidCrystal library using 6 pins!This is a great example of how inefficient the Arduino core routines like digitalWrite() are.It is about 6 times faster than the optimized i2c i/o expander interface.

Hi Bill.

I have followed some of the development on the New LiquidCrystal library and the hardware support. Great job!! Very inspiring.

I thought of doing a version with 595 connected to SPI. Would require two more pins but at full speed the transfer rate could be 4 Mhz giving 4-5 us per byte. That is hard to beat that in cost/performance. Using an ATtiny at a dollar is more expensive but gives a lot of interesting options. An interesting challenge.

The poor performance of Arduino/Wiring and the lack of abstraction/structure was actually what got me started on what became the Cosa project. By chance I stumbled upon Arduino last year during the summer vacation. The work with Cosa started in late November.

Anyway, the latest LCD slave is more a test run of the TWI slave, LCD driver and event framework on an ATtiny84. I needed a test example and pushing I2C further seemed like fun. Also moving an interface between two micro-controllers is also an interesting challenge. I hope to add some tooling for this so that it becomes easier. Something in the line of IDL/Corba, etc.

I've actually did this with 2 595's to be able to utilize all 8 bits of the LCD.Since the transfer rate is so high, a delay must be added to the code of about 30us, this resulted in an average write speed of about 38-40us per byte sent to the LCD, as I have had troubles with missed letters when I tried to go down to the lowest spec of 37us delay (in total).

Here's my schematic (please ignore the resistor net as it wasn't tested. R3 resistor was also unnecessary as my LCD already has a 100ohm built in resistor):*Click to enlarge.

Thanks. I got my 20x4 LCD thinking it supported SPI (as the ebay title said it did, and I still had no idea what's what)And I found that the LiquidCrystal_I2C library I downloaded was awfully slow. About 1ms to send a complete byte or a command, and that is after I optimized it a little bit by removing the unnecessary delays and an extra expander write which wasn't needed.Filling the screen with 80 chars takes about 78ms, that's insane. With the SPI method it takes just over 3ms for 80 chars, that's a huge improvement.Frees up a ton of processor time for other important tasks I too want to build a small backpack for this LCD to go into the project I'm making right now. The only benefit to I2C I can think of right now is that it is probably less susceptible to interference and long wires than the SPI.

The Cosa I2C slave LCD driver is now completed. The initial design has been refactored to a new Virtual LCD class (VLCD) which allows any Cosa LCD device driver to be connected (not just the HD44780 driver). The VLCD class contains two parts; 1) the client part acts as a LCD proxy, translating LCD API calls to I2C messages, 2) the server part acts as an adapter that decodes the I2C messages and calls the LCD implementation.

Below is the CosaLCDslave sketch. It uses the new Virtual LCD class and binding to the HD44780 driver with the 4-bit parallel port IO. This sketch is compiled for an ATtiny84 in the example above but may be compiled for any Cosa supported Arduino.

By implementing the IOStream::Device methods puts(), puts_P() and write() the performance can be boosted to 50-98% of the performance of the I2C IO expander at 400kHz. Below are some results from the benchmarking. The first table shows the performance (operations per second/frames per second), and compares the 4-bit and I2C IO expander implementations (at 100khz and 400 khz).

The above results are used as the baseline for the comparison with the second table below which is the ATtiny84 (internal clock 8Mhz) compiled version and the VLCD version. The comparison is between the 4-bit implementation and then the VLCD implementation (with optimizations).

VLCD may be viewed as a "template" for how to construct I2C slave devices. http://dl.dropboxusercontent.com/u/993383/Cosa/doc/html/d1/d1f/classVLCD.html

The next step is to implement a Cosa USI based TWI master for ATtiny and porting the LCD support. Below is the LCD benchmark running on a LCD with I2C IO expander and an ATtiny85 (internal clock 8 MHz, internal pull-up).

The picture shows 39 operations per second (32 characters plus 2 set cursor per op). The result for standard Arduino (Uno, Nano, etc) is 53 fps.

1. Packaging I2C IO expander updates to a single TWI message for putchar(). To send a byte (data or command) to the LCD four TWI messages (address and 1 byte data) was previously sent (LiquidCrystal_I2C). This is compressed to a single TWI message with address and the four bytes needed to send the byte (via the 4-bit parallel interface) to the LCD.

2. Packaging multiple encoded bytes into a single message for puts(). Applying the first optimization to a sequence of characters sent to the I2C IO expander. This allows (again) the TWI address to be removed. The default internal buffer size is 32 bytes. This gives 7 byte address reduction for an 8 byte string.

The second optimization shows up in the puts() to puts_P() ratio as program strings may contain control characters and are not compressed. Ratio 77/60 = 1.28X further improvement. This also shows up as an improvement when printing numbers (dec/bin in benchmark).

For ATmega with TWI hardware the processor will go into sleep mode during the wait for the completion of the I2C operation (write). A further optimization would be to allow the processor to continue and only sync when a new operation is issued. This would require some additional buffering. The Cosa TWI driver allows asynchronous calls but this feature is not yet used by the LCD driver. The current ATtiny USI based TWI is a bit-banging implementation with micro-second level delays. A redesign of the Cosa RTC (micro second level timer) for ATtiny is necessary to allow asynchronous TWI operation with ISR. This is due to a timer conflict.

The last column in the table above contains the results when using an ATtiny84 as an I2C LCD adapter and reducing the I2C message communication even further. The improvement is then 2.3X.

Cool stuff.I'm curious what core and i2c library you are using for the attiny.

@bperrybap

Thanks for your interest in this project.

I use the MIT ATtiny core by David Mellis. It is more or less only for the compiler settings, fuse bits, and build in the Arduino IDE. All code is Cosa. Non of the Arduino "core" code or libraries are used (except for main() and init() ;-). Same goes for "Mighty".

Cosa is an OO-framework. It supports the major Arduino ATmega/ATtiny within the framework itself with a Board abstraction. Cosa contains a newly written SPI and TWI class library. For ATtiny the implementation is USI based. It supports all SPI modes and both TWI master and slave devices. I find the standard Arduino/Wiring/dtools/AVR TWI a bit difficult to work with ;-) and too low level and slow. Cosa InputPin and OutputPin operations are between 3-5X faster than Arduino/Wiring. They are also object-oriented and symbolic which makes configuration and reuse much easier.

I post Cosa updates and improvements on http://forum.arduino.cc/index.php?topic=150299.0

Received a bunch of 74HC595's today (ebay: $2 for 10 pcs) so now I could add and benchmark a shift register based port version for the Cosa LCD support. It uses basically the same method as suggested by @Nadir and above by @TheCoolest. It uses three pins; data, clock and latch. And where the latch signal is also used for the LCD enable. Below is the 3-wire schematics from the codegoogle arduinoshiftreglcd project page http://code.google.com/p/arduinoshiftreglcd/.

The port is used a bit different for further optimization (later on ;-). Below are the updated LCD benchmark with the initial result for the SR3W support added.

The table values are operations per second. For the putchar, puts and puts_P this corresponds to frames per second on a 16X2 LCD with two set_cursor. The uint16_t dec benchmark is 4 digit decimal print plus set_cursor per operations second. And uint16_t bin benchmark is 14 digit binary number print (total 16 characters with 0x-prefix) plus set_cursor operations per second.

This SR3W implementation uses the Cosa OutputPin serialization function and is "high-level" (i.e. not PORT direct) as the 4-bit parallel version optimization. SPI could be used to boost performance further.

Using the different LCD port adapters is easy. The LCD driver is a single source for all versions. It is only the port adapter that needs implementing. This is one of the great OOP design pattern; delegation. Below is a snippet from the LCD benchmark.

The HD44780 LCD device driver implements the abstract class LCD and can be replaced by any other Cosa LCD device driver implementations in the benchmark source code. Again by changing only a few lines. Below is yet another snippet:

After benchmarking the different LCD port alternatives we can conclude that the Shift Register method has the best cost/performance and can match parallel access methods with a much lower pin count. It would be interesting to see this as part of future Arduino boards/shields.