Neural network describes video footage in real time

Shaky video footage of ordinary day-to-day things - a doorway, a boat docked in a canal, bicycles - is shown during a man's walk down an Amsterdam street. In the top left hand corner of the screen, text appears describing the sidewalk sights.

The text in the video by U.S. artist and coder Kyle McDonald was generated in real time by a neural network. McDonald's network -- which was based off a system called NeuralTalk developed by Stanford Ph.D. student Andrej Karpathy -- analyzes live webcam footage from the laptop and then transcribes what it's seeing in text form.

Some of the computer system's descriptions are more accurate than others. For instance, the network described a man wearing a baseball hat and eating a hot dog as "a man in a suit and tie holding a drink."