Be an Optimist Prime in the world of Computer Vision

Category: Object Detection/Recognition

Where would a computer vision blog be without a post about the new cashier-less store recently opened to the public by Amazon? Absolutely nowhere.

But I don’t need additional motivation to write about Amazon Go (as the store is called) because I am, to put it simply, thrilled and excited about this new venture. This is innovation at its finest where computer vision is playing a central role.

How it all works from a technical (as much as is possible) and non-technical perspective,

Some of the reported issues prior to public opening,

Some reported in-store issues post public opening, and

Some potential unfavourable implications cashier-less stores may have in the future (just to dampen the mood a little)

So, without further ado…

How it Works – Non-Technically & Technically

The store has a capacity of around 90 people – so it’s fairly small in size, like a convenience store. To enter it you first need to download the official Amazon app and connect it to your Amazon Prime account. You then walk up to a gate like you would at a metro/subway and scan a QR code from the app. The gate opens and your shopping experience begins.

Inside the store, if you wish to purchase something, you simply pick it up off the shelf and put it in your bag or pocket. Once you’re done, you walk out of the shop and a few minutes later you get emailed a receipt listing all your purchases. No cashiers. No digging around for money or cards. Easy as pie!

What happens on the technical side of things behind the scenes? Unfortunately, Amazon hasn’t disclosed much at all, which is a bit of a shame for nerds like me. But I shouldn’t complain too much, I guess.

What we do know is that sensor fusion is employed (sensor fusion is when data is combined from multiple sensors/sources to provide a higher degree of accuracy) along with deep learning.

Hundreds of cameras and depth sensors are attached to the ceiling around the store:

The cameras and depth sensors located on the ceiling of the store (image source)

These track you and your movements (using computer vision!) throughout your expedition. Weight sensors can also be found on the shelves to assist the cameras in discerning which products you have chosen to put into your shopping basket.

(Note: sensor fusion is also being employed in autonomous cars. Hopefully I’ll be writing about this as well soon.)

In 2015, Amazon filed a patent application for its cashier-less store in which it stated the use of RGB cameras (i.e. colour cameras) along with facial recognition. TechCrunch, however, has reported that the Vice President of Technology at Amazon Go told them that no facial recognition algorithms are currently being used.

In-Store Issues Prior to Public Opening

Although the store opened its doors to the public a few weeks ago, it has been open to employees since December 2016. Initially, Amazon expected the store to be ready for public use a few months after that but public opening was delayed by nearly a year due to “technical problems“.

We know what some of the dilemmas behind these “technical problems” were.

Firstly, Amazon had problems tracking more than 20 people in the store. If you’ve ever worked on person-tracking software, you’ll know how hard it is to track a crowd of people with similar body types and wearing similar clothes. But it looks like this has been resolved (to at least a satisfactory level for them). It’s a shame for us, though, to not be given more information on how Amazon managed to get this to work.

Funnily enough, some employees of Amazon knew about this problem and in November last year tried to see if a solution had been developed. Three employees dressed up in Pikachu costumes (as reported by Bloomberg here) while doing their round of shopping to attempt to fool the system. Amazon Go passed this thorough, systematic, and very scientific test. Too bad I couldn’t find any images or videos of this escapade!

We also know that initially engineers were assisting the computer vision system behind the scenes. The system would let these people know when it was losing confidence with its tracking results and would ask them to intervene, at least minimally. Nobody is supposedly doing this task any more.

Lastly, I also found information stating that the system would run into trouble when products were taken off the shelf and placed back on a different shelf. This was reported to have occurred when employees brought their children into the store and they ran wild a little (as children do).

This also appears to have been taken care of because someone from the public attempted to do this on purpose last week (see this video) but to no adverse effects, it would seem.

It’s interesting to see the growing pains that Amazon Go had to go through, isn’t it? How they needed an extra year to try to iron out all these creases. This is such a huge innovation. Makes you wonder what “creases” autonomous cars will have when they become more prominent!

In-Store Issues Post Public Opening

But, alas. It appears as though not all creases were ironed out to perfection. Since Amazon Go’s opening a few weeks ago, two issues have been written about.

The first is of Deirdre Bosa of CNBC not being charged for a small tub of yoghurt:

First and foremost, enjoy the yogurt on us. It happens so rarely that we didn’t even bother building in a feature for customers to tell us it happened. So thanks for being honest and telling us. I’ve been doing this a year and I have yet to get an error.

To which Dierdre responded: “Thanks Siggi’s! But I think it’s on Amazon :)”

LOL! 🙂

But as Amazon Go stated, it’s a rarity for these mistakes to happen. Or is that only the case until someone works out a flaw in the system?

Well, it seems as though someone has!

In this video, Tim Pool states that he managed to walk out of the Amazon Go store with a bag full of products and was only charged for one item. According to him it is “absurdly easy to take a bag full of things and not get charged”. That’s a little disconcerting. It’s one thing when the system makes a mistake every now and then. It’s another thing when someone has worked out how to break it entirely.

Tim Pool says he has contacted Amazon Go to let them know of the major flaw. Amazon confirmed with him that he did not commit a crime but “if used in practice what we did would in fact be shoplifting”.

Ouch. I bet engineers are working on this frantically as we speak.

One more issue worth mentioning that isn’t really a flaw but could also be abused is that at the moment you can request a refund on any item without returns. No questions asked. Linus Tech Tips shows in this video how easily this can be done. Of course, since your Amazon Go account needs to be linked to your Amazon Prime account, if you do this too many times, Amazon will catch on and will probably take some form of preventative action against you or will even verify everything by looking back at past footage of you.

Cons of Amazon Go

Like I said earlier, I am really excited about Amazon Go. I always love it when computer vision spearheads innovation. But I also think it’s important to in this post also talk about potential unfavourable implications of a cashier-less store.

Potential Job Losses

The first most obvious potential con of Amazon Go is the job losses that might ensue if this innovation catches on. Considering that 3.5 million people in the US are employed as cashiers (it’s the second-most common job in that country), this issue needs to be raised and discussed. Heck, there have already been protests in this respect outside of Amazon Go:

Bill Ingram, the organiser of the protest shown above asks: “What will all the cashiers do once their jobs are automated?”

Amazon, not surprisingly, has issued statements on this topic. It has said that although some jobs may be taken by automation, people can be relocated to improve other areas of the store by, for example:

Working in the kitchen and the store, prepping ingredients, making breakfast, lunch and dinner items, greeting customers at the door, stocking shelves and helping customers

Let’s also not forget that new jobs have also been created. For example, additional people need to be hired to manage the technological infrastructure behind this huge endeavour.

Personally, I’m not a pessimist about automation either. The industrial revolution that brought automation to so many walks of life was hard at first but society found ways to re-educate into other areas. The same will happen, I believe, if cashier-less stores become a prominent thing (and autonomous cars also, for that matter).

An Increase in Unhealthy Impulse Purchases

Manoj Thomas, a professor of marketing at Cornell University, has stated that our shopping behaviour will change around cashier-less stores:

[W]e know that when people use any abstract form of payment, they spend more. And the type of products they choose changes too.

What he’s saying is that psychological research has shown that the more distance we put between us and the “pain of paying” the more discipline we need to avoid those pesky impulse purchases. Having cash physically in your hand means you can what you’re doing with your money more easily. And that extra bit of time waiting in line at the cashier could be time enough to reconsider purchasing that chocolate and vanilla tub of ice cream :/

Even More Surveillance

And then we have the perennial question of surveillance. When is too much, too much? How much more data about us can be collected?

With such sophisticated surveillance in-store, companies are going to have access to even more behavioural data about us: which products I looked at for a long time; which products I picked up but put back on the shelf; my usual path around a store; which advertisements made me smile – the list goes on. Targeted advertising will become even more effective.

Indeed, Bill Ingram’s protest pictured above was also about this (hence why masks were worn to it). According to him, we’re heading in the wrong direction:

If people like that future, I guess they can jump into it. But to me, it seems pretty bleak.

Harsh, but there might be something to it.

Less Human Interaction

Albert Borgmann, a great philosopher on technology, coined the term device paradigm in his book “Technology and the Character of Contemporary Life” (1984). In a nutshell, the term is used to explain the hidden, detrimental nature and power of technology in our world (for a more in-depth explanation of the device paradigm, I highly recommend you read his philosophical works).

One of the things he laments is how we are increasingly losing daily human interactions due to the proliferation of technology. The sense of a community with the people around us is diminishing. Cashier-less stores are pushing this agenda further, it would seem. And considering, according to Aristotle anyway, that we are social creatures, the more we move away from human interaction, the more we act against our nature.

The Chicago Tribune wrote a little about this at the bottom of this article.

Is this something worth considering? Yes, definitely. But only in the bigger picture of things, I would say. At the moment, I don’t think accusing Amazon Go of trying to damage our human nature is the way to go.

Personally, I think this initiative is something to celebrate – albeit, perhaps, with just the faintest touch of reservation.

Summary

In this post I discussed the cashier-less store “Amazon Go” recently opened to the public. I looked at how the store works from a technical and non-technical point of view. Unfortunately, I couldn’t say much from a technical angle because of the little amount of information that has been disclosed to us by Amazon. I also discussed some of the issues that the store has dealt with and is dealing with now. I mentioned, for example, that initially there were problems in trying to track more than 20 people in the store. But this appears to have been solved to a satisfactory level (for Amazon, at least). Finally, I dampened the mood a little by holding a discussion on the potential unfavourable implications that a proliferation of cashier-less stores may have on our societies. Some of the issues raised here are important but ultimately, in my humble opinion, this endeavour is something to celebrate – especially since computer vision is playing such a prominent role in it.

To be informed when new content like this is posted, subscribe to the mailing list:

I was watching The Punisher on Netflix last week and there was a scene (no spoilers, promise) in which someone was recognised from CCTV footage by the way they were walking. “Surely, that’s another example of Hollywood BS“, I thought to myself – “there’s no way that’s even remotely possible”. So, I spent the last week researching into this – and to my surprise it turns out that this is not a load of garbage after all! Gait Recognition is another legitimate form of biometric identification/verification.

In this post I’m going to present to you my past week’s research into gait recognition: what it is, what it typically entails, and what the current state-of-the-art is in this field. Let me just say that what scientists are able to do now in this respect surprised me immensely – I’m sure it’ll surprise you too!

Gait Recognition

In a nutshell, gait recognition aims to identify individuals by the way they walk. It turns out that our walking movements are quite unique, a little like our fingerprints and irises. Who knew, right!? Hence, there has been a lot of research in this field in the past two decades.

There are significant advantages of this form of identity verification. These include the fact that it can be performed from a distance (e.g. using CCTV footage), it is non-invasive (i.e. the person may not even know that he is being analysed), and it does not necessarily require high-resolution images for it to obtain good results.

The Framework for Automatic Gait Recognition

Trawling through the literature on the subject, I found that scientists have used various ways to capture people’s movements for analysis, e.g. using 3D depth sensors or even using pressure sensors on the floor. I want to focus on the use case shown in The Punisher where recognition was performed from a single, stationary security camera. I want to do this simply because CCTV footage is so ubiquitous today and because pure and neat Computer Vision techniques can be used on such footage.

In this context, gait recognition algorithms are typically composed of three steps:

Pre-processing to extract silhouettes

Feature extraction

Classification

Let’s take a look at these steps individually.

1. Silhouette extraction

Silhouette extraction of subjects is generally performed by subtracting the background image from each frame. Once the background is subtracted, you’re left with foreground objects. The pixels associated with these objects can be coloured white and then extracted.

Background subtraction is a heavily studied field and is by no means a solved problem in Computer Vision. OpenCV provides a few interesting implementations of background subtraction. For example, a background can be learned over time (i.e. you don’t have to manually provide it). Some implementations also allow for things like illumination changes (especially useful for outdoor scenes) and some can also deal with shadows. Which technique is used to subtract the background from frames is irrelevant as long as reasonable accuracy is obtained.

Example of silhouette extraction

2. Feature extraction

Various features can be extracted once we have the silhouettes of our subjects. Typically, a single gait period (a gait cycle) is first detected, which is the sequence of video showing you take one step with each of your feet. This is useful to do because your gait pattern repeats itself, so there’s no need to analyse anything more than one cycle.

Features from this gait cycle are then extracted. In this respect, algorithms can be divided into two groups: model-based and model-free.

Model-based methods of gait recognition take your gait period and attempt to build a model of your movements. These models, for example, can be constructed by representing the person as a stick-figure skeleton with joints or as being composed of cylinders. Then, numerous parameters are calculated to describe the model. For example, the method proposed in this publication from 2001 calculates distance between the head and feet, the head and pelvis, the feet and pelvis, and the step length of a subject to describe a simple model. Another model is depicted in the image below:

Model-free methods work on extracted features directly. Here, undoubtedly the most interesting and most widely used feature extracted from silhouettes is that of the Gait Energy Image (GEI). It was first proposed in 2006 in a paper entitled “Individual Recognition Using Gait Energy Image” (IEEE transactions on pattern analysis and machine intelligence 28, no. 2 (2006): 316-322).

Note: the Pattern Analysis and Machine Intelligence (PAMI) journal is one of the best in the world in the field. Publishing there is a feat worthy of praise.

The GEI is used in almost all of the top gait recognition algorithms because it is (perhaps surprisingly) intuitive, not too prone to noise, and simple to grasp and implement. To calculate it, frames from one gait cycle are superimposed on top of each other to give an “average” image of your gait. This calculation is depicted in the image below where the GEI for two people is shown in the last column.

The GEI can be regarded as a unique signature of your gait. And although it was first proposed way back in 2006, it is still widely used in state-of-the-art solutions today.

Examples of two calculated GEIs for two different people shown in the far right column. (image taken from the original publication)

3. Classification

Once step 2 is complete, identification of subjects can take place. Standard classification techniques can be used here, such as k-nearest neighbour (KNN) and the support vector machine (SVM). These are common techniques that are used when one is dealing with features. They are not constrained to the use case of computer vision. Indeed, any other field that uses features to describe their data will also utilise these techniques to classify/identify their data. Hence, I will not dwell on this step any longer. I will, however, will refer you to a state-of-the-art review of gait recognition from 2010 that lists some more of these common classification techniques.

So, how good is gait recognition then?

We’ve briefly taken a look at how gait recognition algorithms work. Let’s now take a peek at how good they are at recognising people.

We’ll first turn to some recent news. Only 2 months ago (October, 2017) Chinese researchers announced that they have developed the best gait recognition algorithm to date. They claim that their system works with the subject being up to 50 metres away and that detection times have been reduced to just 200 milliseconds. If you read the article, you will notice that no data/results are presented so we can’t really investigate their claims. We have to turn to academia for hard evidence of what we’re seeking.

To test their algorithm, the authors used the CASIA-B dataset. This is one of the largest publicly available datasets for gait recognition. It contains video footage of 124 subjects walking across a room captured at various angles ranging from front on, side view, and top down. Not only this, but walking is repeated by the same people while wearing a coat and then while wearing a backpack, which adds additional elements of difficulty to gait recognition. And the low resolution of the videos (320×240 – a decent resolution in 2005 when the dataset was released) makes them ideal to test gait recognition algorithms on considering how CCTV footage has generally low quality also.

Three example screenshots from the dataset is shown below. The frames are of the same person with a side-on view. The second and third image shows the subject wearing a coat and a bag, respectively.

Example screenshots from the CASIA B dataset of the same person walking.

Recognition rates with front-on views with no bag or coat linger around 20%-40% (depending on the height of the camera). Rates then gradually increase as the angle nears the side-on view (that gives a clear silhouette). At the side-on view with no bag or coat, recognition rates reach an astounding 98.75%! Impressive and surprising.

When it comes to analysing the clips with the people carrying a bag and wearing a coat, results are summarised in one small table that shows only a few indicative averages. Here, recognition rates obviously drop but the top rates (obtained with side-on views) persist at around the 60% mark.

What can be deduced from these results is that if the camera distance and angle and other parameters are ideal (e.g. the subject is not wearing/carrying anything concealing), gait recognition works amazingly well for a reasonably sized subset of people. But once ideal conditions start to change, accuracy gradually decreases to (probably) inadequate levels.

And I will also mention (perhaps something you may have already garnered) that these algorithms also only work if the subject is acting normally. That is, the algorithms work if the subject is not changing the way he usually walks, for example by walking faster (maybe as a result of stress) or by consciously trying to forestall gait recognition algorithms (like we saw in The Punisher!).

However, an accuracy rate of 98.75% with side-on views shows great potential for this form of identification and because of this, I am certain that more and more research will be devoted to this field. In this respect, I will keep you posted if I find anything new and interesting on this topic in the future!

Summary

Gait recognition is another form of biometric identification – a little like iris scanning and fingerprints. Interesting computer vision techniques are utilised on single-camera footage to obtain sometimes 99% recognition results. These results depend on such things as camera angles and whether subjects are wearing concealing clothes or not. But much like other recognition techniques (e.g. face recognition), this is undoubtedly a field that will be further researched and improved in the future. Watch this space.

To be informed when new content like this is posted, subscribe to the mailing list:

Leave this field empty if you're human:

Please share what you just read:

Zbigatron is a blog on the interesting things going on in the world of Computer Vision.

About Me

My name is Zbigniew ("Zig") Zdziarski. I have a PhD in Computer Vision and a passion for anything related to AI.

I also have a Master's in Theology and a Bachelor of Philosophy - because one should never stop learning! More about me.