digitalWrite(clock, LOW)digitalRead(data)digitalWrite(clock, HIGH)There is a lot of overhead in the underlaying C-Code...As this was some time ago my memory might be unprecise - I shall repeat the meassurement and report here again :-)

However, the ability to detect simultaneous presses is very essential. And 36ms isn't very good, but is much more acceptable than 'non-simultaneosity'. Including the pedals and effects, sometimes it will be necessary to detect 5 simultaneous events or more.

Maybe the "PISO" method, even if a little bit "delayed", is the best bet in this case. Is it possible to detect simultaneous events this way?

Also, I planned to do this while i'm on vacation, as a hobby, however, I don't know if 1 month will be enough to finish this, because it looks way harder than I thought.

Is there a noob tutorial teaching in details how to do this (PISO) in practice?

All right, there is another not so well known appproach, using an analogue technique. You can add a resistor to each of the keys to a common voltage, adding the current through them. As you have to identify multiple keys, the values have to be logarithmic which limits this approach to less than 10 keys

this will take far too long, the arduino only has a single ADC (that is multiplexed already) and it is a very slow ADC. It could work with banks of flash ADCs, but that would be expensive.

Latches and encoders could work, it would be much faster than shift registers anyway. Just tri-state all the latches apart from the one you want to read. The encoder isn't necessary but saves port space.

You could also use the Intel 8255, which gives you 3 GPIO ports from one, but it is rather an old chip, I don't know if the is a more modern alternative.

I've found some nice documentations about parallel shifting, including a nice official example (http://www.arduino.cc/en/Tutorial/ShiftIn), however, i must know if *any* 8-bit parallel-to-serial register will do the job, and if the procedure to add more than 2 register is simply to reproduce the same arrangement as the second register.

I did a quick and only moderately representative test, of digital reads vs analogue reads.The test is simplified in that neither case assembles readings into storable bytes, but the overhead would be minimal, and fairly constant in either case.

Wow... Am I wrong, or this is something like 3 milliseconds?? (for the digital input)

Even the analog input is fast enough to me... btw I just found a incredibly nice tutorial on how to use Python to talk to Arduino (http://www.stealthcopter.com/blog/2010/02/python-interfacing-with-an-arduino/) and i'm going to use Python instead of C++ to actually "play" the songs on PC. Zynaddsubfx is 'massive'.

I wonder if the use of parallel to serial registers has some bad limitation though.

Sorry for the delay... So I rechecked the reading of a bit from shift register and the scope showed pulses of 14us. This is absolutely compatible with AWOL's measurement and 10 x faster than I had in mind. What I did formerly was reading out the whole 8 bit register with some complex bit manipulations, which then altogether took 150 us...

Analogue reading: Dont compare apples and oranges! The analogue method will give you 8 readings at a time, so the time for I/O is absolutely comparable!

Remember that the main concern was the mass of chips involved to connect 200+ lines. There is no diference between shift registers and multiplexers. Only the analogue method can reduce that hardware complexity....

The PISO SRs will work just fine, the only thing is the speed using user C code and that seems to be open to conjecture at present. Certainly if the function was written in assembler it would be fast enough.

Unfortunately the Arduino doesn't have a shiftin() function (I think) to match shiftout() as it's possible a library function like this would be written in assembly (OK unlikely but possible).

So what about using the hardware SPI on the chip. It's little more than a shift register itself, if you organize the 28 external SRs as mentioned, pulse one ditgital pin to latch data into all SRs, then send 28 bytes out the SPI, the data will appear on the MOSI line but that is not even connected. However the output of SR #28 is connected to MISO, so every byte you write causes a byte to be read and you grab it before sending the next byte.

See page 167 of the 328 spec sheet for details.

This will run blindingly fast (up to 4Mbps but probably a bit slower if your line lenghts are long) and should solve the "to slow" problem.