i am wondering about latency and time resolution. for example with 25 frames per second, you have 50ms between each input event. so there must work some very interesting interpolation algorithm (look ahead). to trigger a note on right position. how did you do solve this?

The version of the system shown in the video uses cameras that update at 120Hz (8ms between events). The high update rate is the key to making it feel good and is the reason attempts at making similar systems with the Wiimote or Kinect have failed. That said, my new approach uses inexpensive commodity hardware and works just as well but I don't want to give details about how it's done until I finish it.