Image Recognition of Basic Shapes | Google Vs Amazon

This article was written for an individual or company that is looking to conduct image recognition of basic shapes in their operations.

The Problem

Bytelion was recently approached by a client seeking to ID basic shapes via image recognition. The premise being: A computer takes a snapshot of a shape, validates its ID and provides validation. There are many options and tools out there, so we wanted to see how Google and Amazon stacked up each other.

How Does Image Recognition Work?

Image recognition is a form of machine learning designed to recognize patterns in data. Once an image is digitized it is just another form of data. Therefore, attributes that are present in multiple sets of images of one thing can inform us that an element of those images always remains consistent.

For instance, multiple pictures taken of a dog from the same angle but in different lighting conditions would still retain the basic ‘outline’ of the dog, if not the same coloring/shading etc. Labeling this set of graphics ‘dog’ informs the computer that these are images of dogs and that similar images presented are also ‘dogs’. The more images of dogs you provide initially, the better the computer is at guessing whether other images have dogs in them.

Google vs Amazon

There are numerous services available for image recognition, but we decided to test the two leading options: Amazon’s ‘Image Rekognition’ and Google’s ‘Vision API’.

Note:Each services has its own pros and cons. It is best to fully flesh out your use cases before choosing which service to use.

Testing Conditions

Bytelion’s high-tech photography studio

We conducted rounds of testing in both Amazon and Google’s platforms using each of their default image comparison libraries. All of our records are stored in AWS in this s3 bucket.

For our ‘photo studio’ we set up a camera mounted and angled above several books (6 inches high). We took images of 7 different shapes, each image taken against a blank background twice in different positions and twice again against a mesh background. We kept consistent light conditions throughout the process (average Lux count of 360, using this meter).

Definitions:

Confidence: The degree/percentage of probability the image recognition system associates to its given result.

NA: The resulting output does not match the shape presented. That is, the input image did return other results but they are not relevant and therefore have been redacted from this article.

Test Results:

Circle

4 variants of the same circle

Image

Google Vision Confidence

Amazon Rekognition Confidence

Circle Against White Background 1

72%

NA

Circle Against White Background 2

66%

NA

Circle Against Mesh Background 1

52%

NA

Circle Against Mesh Background 2

NA

NA

As seen by the data presented above, Google Vision is the clear winner. While the confidence wasn’t strictly high, Amazon Rekognition was unable to determine that a circle was present in any of the images.

Note: NA means the results didn’t contain a circle in the listed guesses. Also, just because there is a match, does not mean it was the only match, or most likely match.

Triangle

4 variants of the same triangle

Image

Google Vision Confidence

Amazon Rekognition Confidence

Triangle Against White Background 1

69%

63%

Triangle Against White Background 2

63%

96%

Triangle Against Mesh Background 1

NA

95%

Triangle Against Mesh Background 2

NA

NA

This round went to Amazon Rekognition. Not only did it identify the triangle in one more image, it also detected a triangle with greater confidence.

All Other Shapes

The following shapes were also tested in both systems in the same format as above:

Oval

Ring

Semi-Circle

Octagon

Bracket

Unfortunately, neither platform was able to find comparable imagery within their libraries for these shapes. Ranged from having no matches at all to having some matches but none of them the correct shape.

Summary & Conclusion

We cannot say with much confidence that the standard services provided by image recognition services are suitable for detection of basic shapes with out of the box configurations. Default libraries have proven better at identifying abstract imagery (e.g scenery, mountains, animals, people etc) for categorization rather than explicit identification.

Even with the best recognition models in imaging AI, at best there is a 96.54% chance of having a correct match in the top 5 guesses (ref: benchmarking). Maintaining this level of accuracy requires consistent training. Large data sets may require further training time as well as multi-thousand dollar machines. Re-training the same (base) system for a specialized use (such as our case above) can be done in far less time with regular computing power.

Moving Forward

If you really want to identify basic shapes, you will need to conduct your own machine training. A five minute video on how to do this is here. Note, it will take you a little longer than 5 minutes. :- )