Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

It seems to me that machine learning (especially deep learning) can work with thousands (even millions) of different inputs. After training an ML model on the inputs, the computer has hopefully "learned" something about the connections between these millions of inputs, and can hopefully make predictions regarding new unseen inputs.

4 Answers
4

The following is how I understand the distinction; it's based on my own experience engaging with various communities over the last few years, as I've been teaching myself statistics/data science/ML.

The term "machine learning" comes out of a fairly coherent academic/research community, centered in computer science but with moderately strong connections to a few other fields, such as statistics, and weaker connections to fields like genomics. For one example, take a look at the research areas of the faculty in Carnegie Mellon's Machine Learning Department. For another example, all three of the authors of Elements of Statistical Learning — which seems to be one of the standard textbooks in machine learning — are statisticians.

On the other hand, the term "data analysis" is used in radically different ways in different sectors. Data analysts in established companies (especially 10+ years ago) might use Excel and Tableau to plot earning trends over time and do some simple financial modeling. The data analysts in a biology lab might be a couple postdocs and grad students who have taught themselves enough R or Python to run statistical tests and generate plots. Web- or app-based companies might have data analysts building predictive machine learning models of audience/consumer behavior, in order to find ways to marginally increase sales or ad revenue. Data journalists at news media outlets might focus more on building visualizations and interactives of data aggregates and summaries using D3, with little or no deep quantitative analysis (i.e., using neither classical statistics nor predictive models).

Because "data analysis" has different aims and goals in different sectors — and is done by people with very different kinds of formal training — different kinds of "data analysts" will use different software tools and different methods. Deep learning and other predictive models will be appropriate for some kinds of data analysis (web- and app-based companies; certain data-cleaning purposes in scientific research) but irrelevant for others (data journalism).

Data analysis, also known as analysis of data or data analytics, is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making.

Meaning, the focus is on deriving information, insight, or conclusions so that humans may do, understand or decide better.

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.

Here, the emphasis is on programming agents that learn to do things—predict housing prices, transcribe text—from experience, i.e. data.

This, I find, is a helpful distinction: Instead of method, we can reason about goals. If we're analyzing Fourier coefficients of recorded speech to understand how the human brain parses sounds into vowels, that's analysis; if we're doing to write a program that automatically transcribes text, it's machine learning.

What is an insightful KPI to get out of our data? If we are an
eComm retailer, for example, that might be what % of page visits did
we have to pages which we still had in stock? or

What percentage of orders are dispatched within 2 days?

Machine learning, is a whole different game. You might re-state Q1 and Q2, for example:

Can we build a predictive algorithm that predicts what pages
will be out of stock?

Can we build an optimizer which helps us to dispatch orders faster?

In some deeper level, data analysis you are typically using the machine to automate some fairly trivial tasks. With machine learning, you are letting the machine automate a lot of the learning process - where you would be typically passing in that knowledge as a data analyst.

Data Analysis is a process of understanding the data, find patterns and try to obtain inferences due to which the underlying patterns are observed.

Machine Learning is when you train a system to learn those patterns and try to predict the upcoming pattern.

For example,

Think of Amazon.com as a supermarket and one of the employees as a machine, The employee has access to your data. While exploring your data, The employee sees that you buy some chocolates 80% of the time you enter the supermarket and you mostly visit the store on Sundays. The employee finds out that there are 50 other people who follow similar pattern with more/less probability of buying chocolates.
The employee then decides to move all the chocolate desks to the supermarket entrance in order to increase the probability and also attract more customers to buy chocolates.

Here observing the pattern and behavior of shopping by the customers is data analysis, learning similar patterns and changing the desk place to increase those probabilities is the machine learning process