Preface

This is a blog post about a large design research project we completed in 2011 in close partnership with Google Creative Lab.

There wasn’t an opportunity for publication at the time, but it represented a large proportion of the studio’s efforts for that period – nearly everyone in the studio was involved at some point – so we’ve decided to document the work and its context here a year on.

I’m still really proud of it, and some of the end results the team produced are both thought-provoking and gorgeous.

We’ve been wanting to share it for a while.

It’s a long post covering a lot of different ideas, influences, side-projects and outputs, so I’ve broken it up into chapters… but I recommend you begin at the beginning…

Our project through the spring and summer of 2011 concentrated on making evidence around this – investigating computer vision and projection as ‘material’ for designing with, in partnership with Google Creative Lab.

Material Exploration

￼We find that treating ‘immaterial’ new technologies as if they were physical materials is useful in finding rules-of-thumb and exploring opportunities in their “grain”. We try as a studio to pursue this approach as much as someone trying to craft something from wood, stone, or metal.

I read “Mirror Worlds” while I was in architecture school in the 90s. Gelernter’s vision of shared social simulations based on real-world sensors, information feeds and software bots still seems incredibly prescient 20 years later.

Gelernter saw the power to simply build sophisticated, shared models of reality that all could see, use and improve as a potentially revolutionary technology.

What if Google’s mirror world were something out in the real world with us, that we could see, touch and play with together?

Seymour Papert – another incredibly influential computer science and education academic – also came to our minds. Not only did he maintain similar views about the importance of sharing and constructing our own models of reality, but was also a pioneer of computer vision. in 1966 he sent the ‘Summer Vision Memo‘ “Spend the summer linking a camera to a computer, and getting the computer to describe what it saw…”

Our thinking and discussion continued this line toward the cheapness and ubiquity of computer vision.

The $700 Lightbulb

Early on, Jack invoked the metaphor of a “$700 lightbulb”:

Lightbulbs and electric light went from a scientific curiosity to a cheap, accessible ubiquity in the late 19th and early 20th century.

What if lightbulbs were still $700?

We’d carry one around carefully in a case and screw it in when/where we needed light. They are not, so we leave them screwed in wherever we want, and just flip the switch when we need light. Connected computers with eyes cost $500, and so we carry them around in our pockets.

But – what if we had lots of cheap computer vision, processing, connectivity and display all around our environments – like light bulbs?

Ubiquitous Computing has of course been a long held vision in academia, which in some ways has been derailed by the popularity of the smartphone

But smartphones are getting cheaper, Android is embedding itself in new contexts, with other I/Os than a touchscreen, and increasingly we keep our data in the cloud rather than in dedicated devices at the edge.

Ubiquitous computing has been seen by many as in the past as a future of cheap, plentiful ‘throw-away’ I/O clients to the cloud.

“a paintable computer, a viscous medium with tiny silicon fragments that makes a pour-out computer, and if it’s not fast enough or doesn’t store enough, you put another few pounds or paint out another few square inches of computing.”

Professor Neil Gershenfeld of MIT

Updating this to the present-day, post web2.0 world – where if ‘it’s not fast enough or doesn’t store enough’ we request more resources from centralised, elastic compute-clouds.

“Clouds” that can see our context, our environment through sensors and computer vision, and have a picture of us built up through our continued interactions with it to deliver appropriate information on-demand.

To this we added speculation that not only computer-vision would be cheap and ubiquitous, but excellent quality projection would become as cheap and widespread as touch screens in the near-future.

This would mean that the cloud could act in the world with us, come out from behind the glass and relate what it sees to what we see.

In summary: computer vision, depth-sensing and projection can be combined as materials – so how can we use them to make Google services bubble through from the Mirror World into your lap?

How would that feel? How should it feel?

This is the question we took as our platform for design exploration.

“Lamps that see”

One of our first departure points was to fuse computer-vision and projection into one device – a lamp that saw.

Here’s a really early sketch of mine where we see a number of domestic lamps, that saw and understood their context, projecting and illuminating the surfaces around them with information and media in response.

We imagined that the type of lamp would inform the lamp’s behaviour – more static table lamps might be less curious or more introverted than a angle-poise for instance.

Jack took the idea of the angle-poise lamp on, thinking about how servo-motors might allow the lamp to move around within the degrees of freedom its arm gives it on a desk to inquire about its context with computer vision, track objects and people, and surfaces that it can ‘speak’ onto with projected light.

￼

Early sketches of “A lamp that sees” by Jack Schulze

￼

Early sketches of “A lamp that sees” by Timo Arnall

Of course, in the back of our minds was the awesome potential for injecting character and playfulness into the angle-poise as an object – familiar to all from the iconic Pixar animation Luxo Jr.

Alongside these sketching activities around proposed form and behaviour we started to pursue material exploration.

Sketching in Video, Code & Hardware

We’d been keenly following work by friends such as James George and Greg Borenstein in the space of projection and computer vision, and a number of projects in the domain emerged during the course of the project, but we wanted to understand it as ‘a material to design with’ from first principles.

Timo, Jack, Joe and Nick – with Chris Lauritzen (then of Google Creative Lab), and Elliot Woods of Kimchi & Chips, started a series of tests to investigate both the interactive and aesthetic qualities of the combination of projection and computervision – which we started to call “Smart Light” internally.

First of all, the team looked at the different qualities of projected light on materials, and in the world.

This took the form or a series of very quick experiments, looking for different ways in which light could act in inhabited spaces, on surfaces, interact with people and things.

In a lot of these ‘video sketches’ there was little technology beyond the projector and photoshop being used – but it enabled us to imagine what a computer-vision directed ‘smart light’ might behave like, look like and feel like at human scale very quickly.

One particularly compelling video sketch projected an image of a piece of media (in this case a copy of Wired magazine) back onto the media – the interaction and interference of one with the other is spellbinding at close-quarters, and we thought it could be used to great effect to direct the eye as part of an interaction.

Quickly, the team reached a point where more technical exploration was necessary and built a test-rig that could be used to prototype a “Smart Light Lamp” comprising a projector, a HD webcam, a PrimeSense / Asus depth camera and bespoke software.

Elliot Woods working on early software for Lamps

At the time of the project the Kinect SDK now ubiquitous in computer vision projects was not officially available. The team plumped for the component approach over the integration of the Kinect for a number of reasons, including wanting the possibility of using HD video in capture and projection.

Since we didn’t have much need for skeletal tracking in this project, we used very low-level access to the IR camera and depth sensor facilitated by various openFrameworks plugins. This approach gave us the correct correlation of 3D position, high definition colour image, and light projection to allow us to experiment with end-user applications in a unified, calibrated 3D space.

The proto rig became a great test bed for us to start to explore high-level behaviours of Smart Light – rules for interaction, animation and – for want of a better term – ‘personality’.

One phrase that came up again and again around this areas of the lamps behaviour was “Big Brain, Little Brain” i.e. the Smart Light companion would be the Little Brain, on your side, that understood you and the world immediately around you, and talked on your behalf to the Big Brain in ‘the cloud’.

This intentional division, this hopefully ‘beautiful seam’ would serve to emphasise your control over what you let the the Big Brain know in return for its knowledge and perspective, and also make evident the sense (or nonsense) that the Little Brain makes of your world before it communicates that to anyone else.

One illustration we made of this is the following sketch of a ‘Text Camera’

Text Camera is about making the inputs and inferences the phone sees around it to ask a series of friendly questions that help to make clearer what it can sense and interpret.

It reports back on what it sees in text, rather than through a video. Your smartphone camera has a bunch of software to interpret the light it’s seeing around you – in order to adjust the exposure automatically. So, we look to that and see if it’s reporting ‘tungsten light’ for instance, and can infer from that whether to ask the question “Am I indoors?”.

Through the dialog we feel the seams – the capabilities and affordances of the smartphone, and start to make a model of what it can do.

The Smart Light Companion in the Lamp could similarly create a dialog with its ‘owner’, so that the owner could start to build up a model of what its Little Brain could do, and where it had to refer to the Big Brain in the cloud to get the answers.

All of this serving to playfully, humanely build a literacy in how computer vision, context-sensing and machine learning interpret the world.

Rules for Smart Light

The team distilled all of the sketches, code experiments, workshop conversations and model-making into a few rules of thumb for designing with this new material – a platform for further experiments and invention we could use as we tried to imagine products and services that used Smart Light.

Reflecting our explorations, some of the rules-of-thumb are aesthetic, some are about context and behaviour, and some are about the detail of interaction.

We wrote the ‘rules’ initially as a list of patterns that we saw as fruitful in the experiments. Our ambition was to evolve this in the form of a speculative “HIG” or Human Interface Guideline – for an imagined near-future where Smart Light is as ubiquitous as the touchscreen is now…

Smart Light HIG

Projection is three-dimensional. We are used to projection turning a flat ‘screen’ into an image, but there is really a cone of light that intersects with the physical world all the way back to the projector lens. Projection is not the flatland display surfaces that we have become used to through cinema, tv and computers.

Projection is additive. Using a projector we can’t help but add light to the world. Projecting black means that a physical surface is unaffected, projecting white means that an object is fully illuminated up to the brightness of the projector.

Enchanted objects. Unless an object has been designed with blank spaces for projection, it should not have information projected onto it. Because augmenting objects with information is so problematic (clutter, space, text on text) objects can only be ‘spotlighted’, ‘highlighted’ or have their own image re-projected onto themselves.

Light feels touchable (but it’s not). Through phones and pads we are conditioned into interacting with bright surfaces. It feels intuitive to want to touch, grab, slide and scroll projected things around. However, it is difficult to make it interactive.

The new rules of depth. A lamp sees the world as a stream of images, but also as a three-dimensional space. There is no consistent interaction surface to interact with like in mouse or touch-based systems, light hits any and all surfaces and making them respond to ‘touch’ is difficult. This is due to finger-based interaction being very difficult to achieve with projection and computer vision. Tracking fingers is technically difficult, fingers are small, there is limited/no existing skeletal recognition software for detecting hands.

Smart light should be respectful. Projected light inhabits the world alongside us, it augments and affects the things we use every day. Unlike interfaces that are contained in screens, the boundaries of the lamps vision and projection are much more obscure. Lamps ‘look’ at the world through cameras, which mean that they should be trustworthy companions.

Next, we started to create some speculative products using these rules, particularly focussed around the idea of “Enchanted Objects”…

Smart Light, Dumb Products

These are a set of physical products based on digital media and services such as YouTube watching, Google calendar, music streaming that have no computation or electronics in them at all.

All of the interaction and media is served from a Smart Light Lamp that acts on the product surface to turn it from a block into an ‘enchanted object’.

Joe started with a further investigation of the aesthetic qualities of light on materials.

This lead to sketches exploring techniques of projection mapping on desktop scales. It’s something often seen at large scales, manipulating our perceptions of architectural facades with animated projected light, but we wanted to understand how it felt at more intimate human scale of projecting onto everyday objects.

In the final film you might notice some of the lovely effects this can create to attract attention to the surface of the object – guiding perhaps to notifications from a service in the cloud, or alerts in a UI.

Then some sketching in code – using computer vision to create optical switches – that make or break a recognizable optical marker depending on movement. In a final product these markers could be invisible to the human eye but observable by computer vision. Similarly – tracking markers to provide controls for video navigation, calendar alerts etc.

This sub-project was a fascinating test of our Smart Light HIG – which lead to more questions and opportunities.

For instance, one might imagine that the physical product – as well as housing dedicated and useful controls for the service it is matched to – could act as a ‘key’ to be recognised by computer vision to allow access to the service.

What if subscriptions to digital services were sold as beautiful robot-readable objects, each carved at point-of-purchase with a wonderful individually-generated pattern to unlock access?

What happened next: Universe-B

From the distance of a year since we finished this work, it’s interesting to compare its outlook to that of the much-more ambitious and fully-realised Google Glass project that was unveiled this summer.

As a technical challenge it’s been one that academics and engineers in industry have failed to make compelling to the general populace. The Google team’s achievement in realising this vision is undoubtedly impressive. I can’t wait to try them! (hint, hint!)

It’s also a vision that is personal and, one might argue, introverted – where the Big Brain is looking at the same things as you and trying to understand them, but the results are personal, never shared with the people you are with. The result could be an incredibly powerful, but subjective overlay on the world.

This brings with it advantages (collaboration, evidence) and disadvantages (privacy, physical constraints) – but perhaps consider it as a complementary alternative future… A Universe-B where Google broke out of the glass.

Postscript: the scenius of Lamps

No design happens in a vacuum, and culture has a way of bubbling up a lot of similar things all at the same time. While not an exhaustive list, we want to acknowledge that! Some of these projects are precedent to our work, and some emerged in the nine months of the project or since.

Huge thanks to Tom Uglow, Sara Rowghani, Chris Lauritzen, Ben Malbon, Chris Wiggins, Robert Wong, Andy Berndt and all those we worked with at Google Creative Lab for their collaboration and support throughout the project.

SubMap is a visualisation for time and location data on distorted maps. The example above is a point of view map — a projection on a sphere around a particular point.

Planetary Resources is uncloaked this week as a company set up to mine the asteroid belt of our solar system. Yeah. We live in the future.

Planetary Resources is backed by, amongst others, James Cameron, the film director, and Larry Page, CEO of Google. Personally I am somewhat keen on getting society into space. But I didn’t expect it to be done by the International Legion of Billionaires. Brilliant that they’re doing it. Thanks, billionaires!

Corner of a wood floored room with a tool chest, bike, stack of books, box leaning against the wall, an open door with a bag hanging off the doorknob, and a pair of closed double doors with cables hanging on the handles.

Gorgeous.

It gets better: the text descriptions are produced by anonymous individuals distributed around the world, and compensated for their work through Amazon Mechanical Turk. I love imagining the crowd of the internet all teeny tiny, all inside the camera.

Let’s wrap with a bit of self promotion.

The Lytro lightfield camera is one of those WHOA products — photos that you can refocus at any time. It’s magical. When I ran into them last week, they took a photo of Little Printer. Check it out! You can click to refocus (requires Flash).

“The things we are about to share our environment with are born themselves out of a domestication of inexpensive computation, the ‘Fractional AI’ and ‘Big Maths for trivial things’ that Matt Webb has spoken about.

and

‘Making Things See’ could be the the beginning of a ‘light-switch’ moment for everyday things with behaviour hacked-into them. For things with fractional AI, fractional agency – to be given a fractional sense of their environment.

This film uses found-footage from computer vision research to explore how machines are making sense of the world. And from a very high-level and non-expert viewing, it seems very true that machines have a tiny, fractional view of our environment, that sometimes echoes our own human vision, and sometimes doesn’t.

For a long time I have been struck by just how beautiful the visual expressions of machine vision can be. In many research papers and Siggraph experiments that float through our inboxes, there are moments with extraordinary visual qualities, probably quite separate from and unintended by the original research. Something about the crackly, jittery but yet often organic, insect-like or human quality of a robot’s interpetation of the world. It often looks unstable and unsure, and occasionally mechanically certain and accurate.

“Imagine it as, perhaps, the infant days of a young machine intelligence.”

The Robot-Readable World is pre-Cambrian at the moment, but machine vision is becoming a design material alongside metals, plastics and immaterials. It’s something we need to develop understandings and approaches to, as we begin to design, build and shape the senses of our new artificial companions.

As a sidenote, this has reminded me that I was long ago inspired by Paul Bush’s ‘Rumour of true things’ which is ‘constructed entirely from transient images – including computer games, weapons testing, production line monitoring and marriage agency tapes’ and a ”A remarkable anthropological portrait of a society obsessed with imaging itself.’. This found-footage tactic is fascinating: the process of gathering and selecting footage is an interesting R&D exercise, and cutting it all together reveals new meanings and concepts. Something to investigate, as a method of research and communication.

It is rearing its head in our work, and in work and writings by others – so thought I would give it another airing.

The talk at Glug London bounced through some of our work, and our collective obsession with Mary Poppins, so I’ll cut to the bit about the Robot-Readable World, and rather than try and reproduce the talk I’ll embed the images I showed that evening, but embellish and expand on what I was trying to point at.

￼

Robot-Readable World is a pot to put things in, something that I first started putting things in back in 2007 or so.

At Interesting back then, I drew a parallel between the Apple Newton’s sophisticated, complicated hand-writing recognition and the Palm Pilot’s approach of getting humans to learn a new way to write, i.e. Graffiti.

The connection I was trying to make was that there is a deliberate design approach that makes use of the plasticity and adaptability of humans to meet computers (more than) half way.

Connecting this to computer vision and robotics I said something like:

“What if, instead of designing computers and robots that relate to what we can see, we meet them half-way – covering our environment with markers, codes and RFIDs, making a robot-readable world”

“The Cambrian explosion was triggered by the sudden evolution of vision” in simple organisms… active predation became possible with the advent of vision, and prey species found themselves under extreme pressure to adapt in ways that would make them less likely to be spotted. New habitats opened as organisms were able to see their environment for the first time, and an enormous amount of specialization occurred as species differentiated.”

In this light (no pun intended) the “Robot-Readable World” imagines the evolutionary pressure of those three billion (and growing) linked, artificial eyes on our environment.

[it is an aesthetic…] Of computer-vision, of 3d-printing; of optimised, algorithmic sensor sweeps and compression artefacts. Of LIDAR and laser-speckle. Of the gaze of another nature on ours. There’s something in the kinect-hacked photography of NYC’s subways that we’ve linked to here before, that smacks of the viewpoint of that other next nature, the robot-readable world. The fascination we have with how bees see flowers, revealing animal link between senses and motives. That our environment is shared with things that see with motives we have intentionally or unintentionally programmed them with.

We’re in a present, after all, where a £100 point-and-shoot camera has the approximate empathic capabilities of a infant, recognising and modifying it’s behaviour based on facial recognition.

￼
And where the number one toy last Christmas is a computer-vision eye that can sense depth, movement, detect skeletons and is a direct descendent of techniques and technologies used for surveillance and monitoring.

As Matt Webb pointed out on twitter last year:

￼
Ten years of investment in security measures funded and inspired by the ‘War On Terror’ have lead us to this point, but what has been left behind by that tide is domestic, cheap and hackable.

Kinect hacking has become officially endorsed and, to my mind, the hacks are more fun than the games that have published for it.

Greg Borenstein, who scanned me with a Kinect at FooCamp is at the moment writing a book for O’Reilly called ‘Making Things See’.

￼

It is a companion in someways to Tom Igoe’s handbook to injecting behaviour into everyday things with Arduino and other hackable, programmable hardware called “Making Things Talk”.

“Making Things See” could be the the beginning of a ‘light-switch’ moment for everyday things with behaviour hacked-into them. For things with fractional AI, fractional agency – to be given a fractional sense of their environment.

The way the world is fractured from a different viewpoint, a different set of senses from a new set of sensors.

Perhaps it’s the suspicious look from the fella with the moustache that nails it.

And its a thought that was with me while I wrote that post that I want to pick at.

The fascination we have with how bees see flowers, revealing the animal link between senses and motives. That our environment is shared with things that see with motives we have intentionally or unintentionally programmed them with.

“What we see of the real world is not the unvarnished world but a model of the world, regulated and adjusted by sense data, but constructed so it’s useful for dealing with the real world.

The nature of the model depends on the kind of animal we are. A flying animal needs a different kind of model from a walking, climbing or swimming animal. A monkey’s brain must have software capable of simulating a three-dimensional world of branches and trunks. A mole’s software for constructing models of its world will be customized for underground use. A water strider’s brain doesn’t need 3D software at all, since it lives on the surface of the pond in an Edwin Abbott flatland.”

Middle World — the range of sizes and speeds which we have evolved to feel intuitively comfortable with –is a bit like the narrow range of the electromagnetic spectrum that we see as light of various colours. We’re blind to all frequencies outside that, unless we use instruments to help us. Middle World is the narrow range of reality which we judge to be normal, as opposed to the queerness of the very small, the very large and the very fast.”

At the Glug London talk, I showed a short clip of Dawkins’ 1991 RI Christmas Lecture “The Ultraviolet Garden”. The bit we’re interested in starts about 8 minutes in – but the whole thing is great.

In that bit he talks about how flowers have evolved to become attractive to bees, hummingbirds and humans – all occupying separate sensory worlds…

Which leads me back to…

￼
￼
What’s evolving to become ‘attractive’ and meaningful to both robot and human eyes?

Also – as Dawkins points out

The nature of the model depends on the kind of animal we are.

That is, to say ‘robot eyes’ is like saying ‘animal eyes’ – the breadth of speciation in the fourth kingdom will lead to a huge breadth of sensory worlds to design within.

One might look for signs in the world of motion-capture special effects, where Zoe Saldana’s chromakey acne and high-viz dreadlocks that transform here into an alien giantess in Avatar could morph into fashion statements alongside Beyoncé’s chromasocks…

￼

Or Takashi Murakami’s illustrative QR codes for Louis Vuitton.

￼
That a such a bluntly digital format such as a QR code can be appropriated by a luxury brand such as LV is notable by itself.

But, such overt signalling to distinct and separate senses of human and robots is perhaps too clean-cut an approach.

Computer vision is a deep, dark specialism with strange opportunities and constraints. The signals that we design towards robots might be both simpler and more sophisticated than QR codes or other 2d barcodes.

Those QR ‘illustrations’ are gaining attention because they are novel. They are cheap, early and ugly computer-readable illustration, one side of an evolutionary pressure towards a robot-readable world. In the other direction, images of paintings, faces, book covers and buildings are becoming ‘known’ through the internet and huge databases. Somewhere they may meet in the middle, and we may have beautiful hybrids such as http://www.mayalotan.com/urbanseeder-thesis/inside/

In our own work with Dentsu – the Suwappu characters are being designed to be attractive and cute to humans and meaningful to computer vision.

Their bodies are being deliberately gauged to register with a computer vision application, so that they can interact with imaginary storylines and environments generated by the smartphone.

Back to Dawkins.

Living in the middle means that our limited human sensoriums and their specialised, superhuman robotic senses will overlap, combine and contrast.

Wavelengths we can’t see can be overlaid on those we can – creating messages for both of us.

￼

SVK wasn’t created for robots to read, but it shows how UV wavelengths might be used to create an alternate hidden layer to be read by eyes that see the world in a wider range of wavelengths.

Timo and Jack call this “Antiflage” – a made-up word for something we’re just starting to play with.

It is the opposite of camouflage – the markings and shapes that attract and beguile robot eyes that see differently to us – just as Dawkins describes the strategies that flowers and plants have built up over evolutionary time to attract and beguile bees, hummingbirds – and exist in a layer of reality complimentary to that which we humans sense and are beguiled by.

And I guess that’s the recurring theme here – that these layers might not be hidden from us just by dint of their encoding, but by the fact that we don’t have the senses to detect them without technological-enhancement.

And while I present this as a phenomena, and dramatise it a little into being an emergent ‘force of nature’, let’s be clear that it is a phenomena to design for, and with. It’s something we will invent, within the frame of the cultural and technical pressures that force design to evolve.

That was the message I was trying to get across at Glug: we’re the ones making the robots, shaping their senses, and the objects and environments they relate to.

Hence we make a robot-readable world.

I closed my talk with this quote from my friend Chris Heathcote, which I thought goes to the heart of this responsibility.

There’s a whiff in the air that it’s not as far off as we might think.

The Robot-Readable World is pre-Cambrian at the moment, but perhaps in a blink of an eye it will be all around us.

Here are a few things of note which have been posted to the BERG studio mailing list this week.

Matt Jones linked to a lovely poster from Pop Chart Lab, which organises and visualises the taxonomy of super-hero and super-villan powers. For instance, Powers of the Body/Superhuman Ability/Super Strength shows itself to be a highly populous category, but Weapons Based/Powered Prostheses/Armored Suit/Armored Suit with Telescopic Legs less so, highlighting a possible Darwin Awards subtext to it all.

Alex linked to Pica-Pic, a Flash site which lovingly recreates vintage 80’s handheld electronic games from around the world.