Matt Richardson

Descriptive Camera

The descriptive camera outputs a text description of the captured scene instead of a photograph.

The Descriptive Camera works a lot like a regular camera—point it at subject and press the shutter button to capture the scene. However, instead of producing an image, this prototype outputs a text description of the scene. Modern digital cameras capture gobs of parsable metadata about photos such as the camera's settings, the location of the photo, the date, and time, but they don't output any information about the content of the photo. The Descriptive Camera only outputs the metadata about the content.

As we amass an incredible amount of photos, it becomes increasingly difficult to manage our collections. Imagine if descriptive metadata about each photo could be appended to the image on the fly—information about who is in each photo, what they're doing, and their environment could become incredibly useful in being able to search, filter, and cross-reference our photo collections. Of course, we don't yet have the technology that makes this a practical proposition, but the Descriptive Camera explores these possibilities.

BackgroundAfter the class readings and discussions about parsability of space, most of the research went into effectively using the BeagleBone and Amazon's Mechanical Turk Service.

Audience
The audience for this prototype is those who appreciate novel uses of technology. As for a proper application of the technology, I think it's possible that similar methods could be used to help us search, filter, and cross-reference our photo collections.

User Scenario
To demonstrate the prototype during an exhibition, I would have the camera strapped around my neck. When someone approaches, I would very briefly explain what it is and ask if they want their picture taken. I would take their picture (or a picture of something in the room) and explain the technology behind it as we wait for the print to come out. I would let the person keep their print.

Implementation
At the core of the Descriptive Camera is a BeagleBone, which connects to the internet via Ethernet. A USB webcam attached to the BeagleBone captures the image when the shutter is pressed and the image is uploaded for processing by Amazon's Mechanical Turk service (or in accomplice mode, it IM's a link to picture to someone I predetermine). When the camera receives a response from the server, it prints it out with a built-in thermal printer.

Conclusion
I learned an incredible amount about working with the BeagleBone. Just getting a JPG from the webcam in Linux proved to be difficult. After trying lots of methods, I finally found one that worked. This was also my first opportunity working with Python's serial modules, which worked well for me. Formatting the text for the print solidified my text processing skills in Python. After getting to understand the way Mechanical Turk works (and how it's not always the best option when I want lots of results quickly), I turned to my own solution. I created my own interface for workers to submit descriptions and implemented a system to instant message them as soon as a photo is ready to be described.