Machine learning. Artificial Intelligence

Menu

Notes on Gartner’s 2018 Data Science and Machine Learning MQ

Gartner just released the 2018 Magic Quadrant for Data Science and Machine Learning Platforms. You can get a copy directly from Gartner if you’re a client, or you can get one here, courtesy of RapidMiner.

(1) Vendors come, vendors go.

— In November, TIBCOacquiredAlpine in a fire sale. How do we know it was a fire sale?

Alpine’s last venture round dates waaaaaaay back to 2013.

Alpine had to tap a convertible note last May.

Terms of the deal were not disclosed.

TIBCO is a bottom feeder.

TIBCO moshed Alpine together with Statistica, which it acquired last May, and wound up with a Gartner rating below what Statistica previously achieved on its own. So much for synergy.

Here’s a bye-ku for Alpine:

Alpine placed its bets

On Greenplum, that aging horse.

Sadly, TIBCO calls

— After lagging behind the market on “Completeness of Vision” for several years, FICO finally dropped off the MQ completely. Nobody will ever accuse FICO of too much vision, but the company’s customer franchise in risk and fraud is a license to print money.

Two new vendors made it in:

— Anaconda had a busy 2017. A new CEO turned over most of the management team, rebranded the company (previously Continuum Analytics), closed what looks like a down round, and prettied up the website. Anaconda’s free distribution has a lock on Pythonistas, but monetization seems to be an issue. The company debuts as a Niche Vendor, where vendors go to die.

— Databricks, aka “the people who invented Apache Spark” debuts in the Visionary category. As Spark matures, “we really know this shit” becomes less valuable as a tag. If Spark is stable, do you really need Matei Zaharia to check your code? Anyway, Databricks offers a cloud-based Spark service that works well for data engineering and application development. For data science, it’s as good as Spark ML — in other words, meh.

Congrats to Alteryx and H2O.ai, who moved up to the Leader quadrant.

IBM dropped out of the Leaders quadrant. For more on that train wreck, see (4) below.

(2) Open source eats the MQ.

Commercially licensed proprietary software used to dominate the MQ, but no more. Take a squint at the 2018 Leaders:

H2O.ai: primarily open source distribution, with a few proprietary bits.

KNIME: primarily open source distribution, with a few proprietary bits

RapidMiner: open source distribution, with commercially licensed enterprise version

Alteryx: leverages R for advanced analytics

SAS is the outlier in the quadrant. However, the company is changing its tune. SAS used to laugh and point fingers at open source software, muttering darkly about bugs and viruses, until someone pointed out that the company bundles Apache Tomcat and uses Apache HTTP Server for security.

Among the Visionaries, the same pattern holds. Databricks, Dataiku, and Domino provide enterprise provisioning services for data scientists who use open source tooling. IBM and Microsoft embrace open source tooling on their cloud platforms.

On the left side of the MQ, data scientists love Anaconda‘s open source distribution, even if Gartner isn’t wild about the company’s commercial offering. The rest of the lot: Angoss, Mathworks, SAP, Teradata, and TIBCO all sell commercially licensed proprietary software.

The lesson is clear. Vendors who embrace open source will do well. Everyone else is roadkill.

(3) Real data science tools get traction.

Gartner used to embrace “citizen data science” the way Tristan embraces Isolde. Tools that data scientists actually use got spanked for being “hard to use.”

With a fresh group of analysts driving the MQ, Gartner now seems to recognize that enterprises employ real data scientists, and real data scientists drive value. For hard money use cases, like fraud prevention or algorithmic trading, you don’t assign Mr. Freshface, who doesn’t really know how to do data science but would like to try his hand at it anyway. You assign Ms. Badass, who thinks in seven programming languages, wrote her dissertation on adversarial convolutional deep neural nets, and contributes to Apache MXNet in her spare time.

Take another look at the 2018 MQ. With programming APIs, H2O.ai appeals exclusively to a data science user persona. So do Anaconda, Databricks, Dataiku, and Domino.

Alteryx, KNIME and RapidMiner provide visual interfaces but enable users to embed Python or R scripts. Many users of these products use the workflow tool as a kind of shell to organize a project and manage data, but write code for the computational heavy lifting.

That’s true for SAS as well. SAS recently hosted analysts at the Ritz Carlton in Naples, Florida, where they trotted out a new suite of graphical tools that nobody will use. Experienced SAS users prefer writing SAS code in Display Manager, or SAS Studio. If they use a visual tool like Enterprise Guide or Enterprise Miner, they write code and drop it into a code node.

Imagine that. To succeed in data science, you have to write code.

(4) Gartner no longer has a warm and fuzzy for IBM.

As long as I’ve tracked this MQ — as long as Gartner has published it — no vendor has ever fallen out of the Leaders quadrant. Except for Dell, when they abandoned the business. If you’re familiar with what Dell offered, you can understand why they unloaded it.

IBM managed to pull off the stunt in the 2018 MQ. Gartner dinged the company on “Ability to Deliver,” a measure driven largely by a survey of reference customers. A poor rating on “Ability to Deliver” usually means that your customers think you suck.

IBM can’t say it wasn’t warned. Consider these gems from last year’s report:

Customers are often confused by mismatches between (IBM’s) marketing messages and actual, purchasable products.

Translation: IBM’s marketing is bullshit.

Reference customers expressed dissatisfaction with IBM’s support and bureaucracy; they reported difficulties finding the right liaisons and technical help, despite high maintenance fees.

Translation: outsourcing support wasn’t the smartest move.

But as you’ve heard, IBM is now a young, hip, and agile company. Armed with Gartner’s 2017 report, company executives rolled up their sleeves and tackled the issues.

Oh, wait. Look at this from the 2018 report:

Feedback from reference customers on their customer experience with IBM was unfavorable, including low scores for inclusion of enhancements/requests into subsequent releases, overall rating of product capabilities and business value delivered. IBM’s operations also scored poorly, with low scores for documentation, customer support and analytic support.

But surely IBM has its branding under control.

There remains confusion in the market about IBM’s Watson branding. By now, we would have expected IBM SPSS Modeler to be fully integrated into the IBM Watson ecosystem. In addition, there is confusion about the relationship of, and distinction between, SPSS and DSX. The general availability in August 2017 of the Watson Machine Learning service, which assists operationalization, model management and workflow automation, exacerbates the confusion.

Translation: you can slap the Watson brand on it, but it’s still SPSS.

DSx is likely to be one of the most attractive platforms in the future — modern, open, flexible and suitable for a range of users, from expert data scientists to business people.

Which makes this interesting:

Data Science Experience (DSX)…did not meet our criteria for evaluation on the Ability to Execute axis.

Translation: two years after launch, IBM can’t produce paid reference customers for DSX.

A reader wanted me to know that DSX now runs on IBM mainframes. I didn’t have the heart to tell him how dumb that sounds.

(5) This graphic still works pretty well.

Last year, I made up this “Magic Quadrant” based on the tools data scientists actually use, as a joke. It went viral. Here’s an update:

H2O moves up from last year, and so does XGBoost; Szilard Pafka will be pleased.

While Apache Spark remains the go-to tool for data engineering and application development. interest among data scientists peaked a year or so ago. TensorFlow is now the cool kid on the block. We’re also seeing renewed interest in Caffe/Caffe2, due to the hot market for image classification and recognition.

Yeah, I know. I forgot PyTorch.

Apache Flink has solid use cases in stream processing, but its champions no longer bother to say it’s a tool for machine learning. Here’s a bye-ku for Flink:

Ten guys in Berlin

Thought Flink would eat the world, but

Budding users yawned

We can also drop Mahout and Pig from the chart. And now that Neo4J has a Spark backend, you can stick a fork in GraphX. Please.

16 comments

From my experience and from anecdotal evidence I’ve seen that it will take a team of your best IT experts more than 2 months to stand up DSX if you are foolish enough to try it. Puts a whole different spin on “Ability to Execute”.

“…where they trotted out a new suite of graphical tools that nobody will use. Experienced SAS users prefer writing SAS code in Display Manager, or SAS Studio. If they use a visual tool like Enterprise Guide or Enterprise Miner, they write code and drop it into a code node.”

If you are referring to the tools I think you are, there is the capability to add a code node in that flow. Now whether that code node will produce the same result with the same data in the same suite of products is another story. Needless to say, they have their TOP PEOPLE working on that.

Only if (a) 100% of the data you will ever need is already in Teradata; (b) the Teradata software is functionally equivalent to competitive data science and machine learning platforms; and (c) the database administrators have a clue about the needs of data scientists and don’t lock down the platform in such a way that data scientists can’t use it.

In actual practice, (a) is never true; (b) is not true, and (c) is often not true.

Teradata has been pitching embedded analytics since 1989. It’s a long trail of tears.