Blind software developer Saqib Shaikh uses an artificial intelligence headset to capture images of the world around him and process them in order to understand what is happening. Image: Microsoft

Saqib Shaikh is a software developer from London, England who is currently working for
Microsoft on the firm’s Bing search engine. Shaikh lost his sight when he was seven years old. In the pursuit of the freedoms that sighted people all take for granted every day, Shaikh has been personally involved in the development of an application of Artificial Intelligence, cognitive computing, image recognition and mobile headset technologies.

The image analysis processing, cognitive reasoning and speech intelligence in the device Shaikh uses allows him to ‘see’ the world around him in a way that was considered to be part of science fiction as recently as a decade ago.

How does Saqib see?

Shaikh is shown on a video linked here explaining how the specific confluence of technologies he uses helps him. The intelligence comes from ‘Seeing AI’, which is a research project that helps people who are visually impaired or blind to understand who and what is around them. The app is built using intelligence APIs from Microsoft Cognitive Services.

The app itself runs on smartphones and also on pivothead smartglasses. The glasses are built with a side button that the user touches while wearing to take a snapshot of the world in front of them. The image capture and analysis software that the glasses (or smartphone) uses is able to plug into cloud-based services that will help determine what the user is looking at.

How clever are ‘seeing’ computers?

The current state of image analysis software at the time of writing is able to determine the difference between men and women, the shape of standard objects (such as a desk, building, plate of food and so on), the state of facial expressions (such as happy, angry, confused and so on) and whether motion is happening.

Imagine not being able to see and you hear the sound of a boy on a skateboard sliding past you. But is it a skateboard or is it an industrial forklift truck or something else dangerous? The software can recognize the shape of a boy and the fact that his feet at the bottom of his body are on a piece of wood with wheels on it and that he is moving along -- it is most likely to be a boy on a skateboard, so the app's speech engine says this out loud.

While it’s far from perfect, this kind of intelligence is often delivered with a 'probability score' to inform the user of the likelihood that the computer itself has got the right answer. The app can read out text, meaning that blind people can now read signs, food menus, travel information, directions and all sorts of other written material that they would have previously been oblivious to.

Redmond had previously branded its Microsoft Cognitive Services division as Project Oxford. While most technical observers will now be asking whether Microsoft can catch IBM with its darling Watson division in the cognitive market, Microsoft is making a play to developers by providing 22 Application Programming Interfaces (APIs) on the firm’s new cognitive portal. In the spirit of openness and technology development, Microsoft is now hoping that developers will in fact take the APIs and start to customize them for their own project needs to help this technology base grow.

Talking about his own work at Microsoft, Shaikh notes the following, “I've spent the last three years working on Bing's backend algorithms; mining data to produce the information displayed on the search results page to complement the core results. For example, suggesting ‘related searches’ that may lead you to the information you seek, identifying ‘deep links’ to take you to common destinations within a search result, or determining the order in which elements should appear on the page.”

The future for enablement technologies

As Shaikh himself said, we would have considered this kind thing to be the stuff of movies and science fiction up until now. Shaikh is of course ‘lucky’ because he was able to see up until age seven; how we might further augment technology like this with extra contextual descriptors (or some other form of insight) for those that have been blind from birth is another level of innovation away.

As of 2016 we still consider computer-based electronic connections to ‘project’ information directly into our brains to be the work of science fiction.

I am a technology journalist with over two decades of press experience. Primarily I work as a news analysis writer dedicated to a software application development ‘beat’; but, in a fluid media world, I am also an analyst, technology evangelist and content consultant. As the...