Get Regular Updates

You have Successfully Subscribed!

A Comparison of Machine Learning Services on AWS, Azure, and Google Cloud

Artificial intelligence and machine learning are steadily making their way into enterprise applications in areas such as customer support, fraud detection, and business intelligence. There is every reason to believe that much of it will happen in the cloud.

The top cloud computing platforms are all betting big on democratizing artificial intelligence. Over the past three years, Amazon, Google, and Microsoft have made significant investments in artificial intelligence (AI) and machine learning, from rolling out new services to carrying out major reorganizations that place AI strategically in their organizational structures. Google CEO, Sundar Pichai, has even said that his company is shifting to an “AI-first” world.

So, if the cloud is the destination for your machine learning projects, how do you know which platform is right for you? In this post, we’ll explore the machine learning offerings from Amazon Web Services, Microsoft Azure, and Google Cloud Platform.

What are the Benefits of Machine Learning in the Cloud?

The cloud’s pay-per-use model is good for bursty AI or machine learning workloads.

The cloud makes it easy for enterprises to experiment with machine learning capabilities and scale up as projects go into production and demand increases.

You don’t need to use a cloud provider to build a machine learning solution. After all, there are plenty of open source machine learning frameworks, such as TensorFlow, MXNet, and CNTK that companies can run on their own hardware. However, companies building sophisticated machine learning models in-house are likely to run into issues scaling their workloads, because training real-world models typically requires large compute clusters.

The barriers to entry for bringing machine learning capabilities to enterprise applications are high on many fronts. The specialized skills required to build, train, and deploy machine learning models and the computational and special-purpose hardware requirements add up to higher costs for labor, development, and infrastructure.

These are problems that cloud computing can solve and the leading public cloud platforms are on a mission to make it easier for companies to leverage machine learning capabilities to solve business problems without the full tech burden. As AWS CEO Andy Jassy highlighted in his 2017 re:Invent keynote, his company has to “solve the problem of accessibility of everyday developers and scientists” to enable AI and machine learning in the enterprise.

There are many good reasons for moving some, or all, of your machine learning projects to the cloud. The cloud’s pay-per-use model is good for bursty AI or machine learning workloads, and you can leverage the speed and power of GPUs for training without the hardware investment. The cloud also makes it easy for enterprises to experiment with machine learning capabilities and scale up as projects go into production and demand for those features increases.

Perhaps even more importantly, the cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science—skills that are rare and in short supply. A survey by Tech Pro Research found that just 28% of companies have some experience with AI or machine learning, and 42% said their enterprise IT personnel don’t have the skills required to implement and support AI and machine learning.

AWS, Microsoft Azure, and Google Cloud Platform offer many options for implementing intelligent features in enterprise applications that don’t require deep knowledge of AI or machine learning theory or a team of data scientists.

The Spectrum of Cloud Machine Learning Services

It’s helpful to consider each provider’s offerings on the spectrum of general-purpose services with high flexibility at one end and special-purpose services with high ease-of-use at the other.

For example, Google Cloud ML Engine is a general-purpose service that requires you to write code using Python and the TensorFlow libraries, while Amazon Rekognition is a specialized image-recognition service that you can run with a single command. So, if you have a typical requirement, such as video analysis, then you should use a specialized service. If your requirement is outside the scope of specialized services, then you’ll have to write custom code and run it on a general-purpose service.

It’s worth noting that all three of the major cloud providers have also attempted to create general-purpose services that are relatively easy to use. Examples include the Google Prediction API, Amazon Machine Learning, and Azure Machine Learning Studio. They fall somewhere in the middle of the spectrum. At first, it might seem like this type of service would give you the best of both worlds, since you could create custom machine learning applications without having to write complex code. However, the cloud providers discovered that there isn’t a big market for simple, general-purpose machine learning. Why? They’re not flexible enough to handle most custom requirements and they’re more difficult to use than specialized services.

In fact, Google has discontinued its Prediction API and Amazon ML is no longer even listed on the “Machine Learning on AWS” web page. However, Azure Machine Learning Studio is still an interesting service in this category, because it’s a great way to learn how to build machine learning models for those who are new to the field. It has a drag-and-drop interface that doesn’t require any coding (although you can add code if you want to). It supports a wide variety of algorithms, including different types of regression, classification, and anomaly detection, as well as a clustering algorithm for unsupervised learning. Once you have a better understanding of machine learning, though, you’re probably better off using a tool like Azure Machine Learning Workbench, which is more difficult to use, but provides more flexibility.

What AI Tools Should I Use?

If you are implementing AI for the first time, then you should start with one of the specialized services. Designed as standalone applications or APIs on top of pre-trained models, each platform offers a range of specialty services that allow developers to add intelligent capabilities without training or deploying their own machine learning models. The main offerings in this category are primarily focused on some aspect of either image or language processing.

This list highlights Azure’s strategy of splitting products into separately branded, very specific AI tasks. Most of these features are also offered by Amazon and Google, but as part of broader APIs. As you can see in the chart, all three of the vendors offer essentially the same capabilities. Microsoft and Google do have a few unique offerings, though. For example, Azure Custom Decision Service helps personalize content and Google Cloud Talent Solution helps with the recruiting process.

Which General AI Offerings Should I Consider?

General-purpose machine learning offerings are used to train and deploy machine learning models. Since specialized AI services only cover a narrow subset of uses, such as image and language processing, you’ll need to use a general-purpose machine learning (ML) service for everything else. For example, many companies need product recommendation engines and fraud detection for their ecommerce sites. These applications require custom machine learning models.

Amazon SageMaker:

12 common machine learning algorithms

TensorFlow and MXNet pre-installed

Can use other ML frameworks

Google Cloud ML Engine:

Supports TensorFlow (as well as scikit-learn and XGBoost in beta)

Azure Machine Learning Workbench & Machine Learning Services:

Supports Python-based machine learning frameworks, such as TensorFlow or PyTorch

Amazon SageMaker is described by AWS as a “fully managed, end to end machine learning service” that is designed to be a fast and easy way to add machine learning capabilities. In addition to the AWS Gluon machine learning library, SageMaker supports TensorFlow, MXNet, and many other machine learning frameworks. It was launched in November 2017 at the annual AWS re:Invent conference.

Google released its Cloud ML Engine in 2016, making it easier for developers with some machine learning experience to train models. Google created the popular open-source TensorFlow machine learning framework, which is currently the only framework that Cloud ML Engine supports (although it now offers beta support for scikit-learn and XGBoost). Both Amazon and Azure support TensorFlow and several other machine learning frameworks.

In addition to its older Machine Learning Studio, Azure has two separate machine learning services. The Experimentation Service is designed for model training and deployment, while the Model Management Service provides a registry of model versions and makes it possible to deploy trained models as Docker containerized services. Machine Learning Workbench is a desktop-based frontend for these two services.

How is Hardware Impacted by Machine Learning Workloads?

Machine learning workloads require greater processing power

The amount of processing required could be expensive

GPUs are the processor of choice for many ML workloads because they significantly reduce processing time

Google and other companies are creating hardware that’s optimized for machine learning jobs

To help people get started with AI, Amazon offers a camera that can run deep learning models

Hardware is an important consideration when it comes to machine learning workloads. Training a model to recognize a pattern or understand speech requires major parallel computing resources, which could take days on traditional CPU-based processors. In comparison, powerful graphics processing units (GPUs) are the processor of choice for many AI and machine learning workloads because they significantly reduce processing time.

AWS, Azure, and Google Cloud all support using either regular CPUs or GPUs to train models. Google has a unique offering with its Cloud TPUs (Tensor Processing Units). These chips are designed to speed up machine learning tasks. Not surprisingly, they work with TensorFlow. Many other companies are now racing to catch up with Google and release their own ML-optimized hardware.

Outside of processing, AWS has several unique offerings in the hardware category. Its AWS DeepLens wireless video camera can run deep learning models on what it sees and perform image recognition in real time. Amazon seems to be promoting client-side processing as an easy way to get started learning about machine learning.

Although not strictly hardware, the AWS Greengrass ML Inference service allows you to perform machine learning inference processing on your own hardware that’s AWS Greengrass-enabled. Better still, you can keep using the extensive GPU compute power in the cloud to train your machine learning models, then deploy the outcomes to your own devices running AWS Greengrass ML Inference. Running ML Inference locally reduces the amount of device data to be transmitted to the cloud, and therefore reduces costs and latency of results.

What are the Open Source Standards Machine Learning Platforms Use?

Each platform’s deep learning offerings and their positions on wider industry-level machine learning initiatives, open standards, and so forth are a good indication of what the future holds. Deep learning offerings, in particular, highlight how the space has achieved a balance between competition and cooperation among providers.

Google created an open-sourced TensorFlow, which has become widely popular among machine learning enthusiasts. Despite its connection to Google, both Amazon and Microsoft support TensorFlow in their deep learning services as well.

Amazon has thrown its support behind Apache MXNet, advocating it as the company’s weapon of choice for machine learning and actively promoting it both internally and externally. MXNet underpins several of its machine learning and AI services.

Microsoft provides CNTK, otherwise known as the Microsoft Cognitive Toolkit, for deep learning at the commercial level.

AWS and Microsoft have jointly created the Gluon specification, which is a higher-level abstraction for developing machine learning models. Gluon currently supports MXNet and will soon be extended to CNTK. The Gluon interface simplifies the development experience and is aimed at winning over new developers early in their machine learning journey.

ONNX, the Open Neural Network Exchange from Facebook and Microsoft, is aimed at creating transferable machine learning models. With ONNX, you create your machine learning model in an open format that allows it to then be trained on supported machine learning frameworks. ONNX has the support of both AWS and Microsoft, but Google has yet to come on board.

Benefits of Machine Learning in the Cloud: Conclusion

Since Azure, Google Cloud, and AWS all provide good general-purpose and specialized machine learning services, you will probably want to choose the platform that you’ve already chosen for your other cloud services. However, to avoid vendor lock-in when using a general-purpose service, you may want to use an open-source machine learning framework that is supported by all three vendors. At the moment, the framework with the broadest support is TensorFlow, although the field is changing rapidly, so we expect cross-platform support for more frameworks soon. The main holdout is Google, which previously supported only TensorFlow, but even Google is now introducing support for scikit-learn and XGBoost.

The Cloud Academy library includes machine learning courses for all three platforms, most of which contain examples using TensorFlow or scikit-learn. Some of the learning paths on this subject include:

The AWS and Azure learning paths also include hands-on labs so you can practice your skills.

We’re regularly adding new machine learning content to our library, based on what our customers need, so try the learning paths above and then let us know what else you would like to see. Good luck with your machine learning efforts!

Guy has been helping people learn IT technologies for over 20 years. He is the Azure and Google Cloud Content Lead at Cloud Academy. Guy's passion is making complex technology easy to understand. Jeremy is currently employed as a Cloud Researcher and Trainer - and operates within Cloud Academy's content provider team authoring technical training documentation for both AWS and GCP cloud platforms.

Related Posts

Let’s look at what benefits these two new EC2 instance types offer and how these two new instances could be of benefit to you. Both of the new instance types are built on the AWS Nitro System. The AWS Nitro System improves the performance of processing in virtualized environments by...

Google Cloud Platform (GCP) has evolved from being a niche player to a serious competitor to Amazon Web Services and Microsoft Azure. In 2018, research firm Gartner placed Google in the Leaders quadrant in its Magic Quadrant for Cloud Infrastructure as a Service for the first time. In t...

Security in AWS is governed by a shared responsibility model where both vendor and subscriber have various operational responsibilities. AWS assumes responsibility for the underlying infrastructure, hardware, virtualization layer, facilities, and staff while the subscriber organization ...

Is it possible to create an S3 FTP file backup/transfer solution, minimizing associated file storage and capacity planning administration headache?FTP (File Transfer Protocol) is a fast and convenient way to transfer large files over the Internet. You might, at some point, have conf...

Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs).Microservices have become increasingly popular over the past few years. The modular architectural style,...

There are many use cases for tags, but what are the best practices for tagging AWS resources? In order for your organization to effectively manage resources (and your monthly AWS bill), you need to implement and adopt a thoughtful tagging strategy that makes sense for your business. The...

Amazon S3 is the most common storage options for many organizations, being object storage it is used for a wide variety of data types, from the smallest objects to huge datasets. All in all, Amazon S3 is a great service to store a wide scope of data types in a highly available and resil...

One of the main promises of cloud computing is access to nearly endless capacity. However, it doesn’t come cheap. With the introduction of Spot Instances for Amazon Web Services’ Elastic Compute Cloud (AWS EC2) in 2009, spot instances have been a way for major cloud providers to sell sp...

The AWS Command Line Interface (CLI) is for managing your AWS services from a terminal session on your own client, allowing you to control and configure multiple AWS services.So you’ve been using AWS for awhile and finally feel comfortable clicking your way through all the services....

Thousands of cloud practitioners descended on Chicago’s McCormick Place West last week to hear the latest updates around Amazon Web Services (AWS). While a typical hot and humid summer made its presence known outside, attendees inside basked in the comfort of air conditioning to hone th...

Containers can help fragment monoliths into logical, easier to use workloads. The AWS Summit New York was held on July 17 and Cloud Academy sponsored my trip to the event. As someone who covers enterprise cloud technologies and services, the recent Amazon Web Services event was an insig...

If you’re building applications on the AWS cloud or looking to get started in cloud computing, certification is a way to build deep knowledge in key services unique to the AWS platform. AWS currently offers nine certifications that cover the major cloud roles including Solutions Archite...