Machine learning. Artificial Intelligence

Menu

Forrester’s 2018 PAML “Waves”

Forrester just published two “Wave” reports for predictive analytics and machine learning. The first, covering “multi-modal” solutions, is available here for free. A second report, covering notebook-based solutions, is available here (registration required.)

Kudos to Forrester for understanding the diversity of the data science tools market. Software with a visual interface does not compete with code-centric software — it appeals to a different class of users. Instead of trashing code-based tools as “too hard to use,” Forrester recognizes that they belong in a separate category.

Let’s take a quick look at how vendors fared in each report

Multimodal Predictive Analytics and Machine Learning Platforms

Here’s the “Wave”:

My comments:

— SAS did well. Forrester gives a glowing review to SAS Visual Data Mining and Machine Learning. That squares with what I hear from the few customers willing to pay for it. Use a wizard to automatically train a model is a bit of a stretch, though. VDM/ML supports automated parameter tuning, but data engineering, feature engineering, experiment management, model evaluation, and model selection are all manual tasks. Oh, and for model management, you need to license another SAS product.

— Forrester’s assessment of IBM makes less sense to me. Watson Studio is a quodlibet of previously available services, cobbled together and pushed out the door just in time for analyst review season. Those “SPSS-inspired” workflows look an awful lot like — wait for it — SPSS, which IBM did not submit for review because it’s so done. IBM Watson Studio is only available on IBM Cloud, everyone’s fifth choice in cloud platforms, which makes it seem more like a niche product. Does anyone actually pay for Watson Studio? I’ve only run into it when some Blue customer gets free credits with an IBM enterprise agreement.

— Forrester notes that RapidMiner helps 380,000 users. If only more of them paid for the privilege.

— Angoss (Datawatch), FICO, KNIME, and SAP all fell out of the “Leaders” category, which was getting pretty crowded in last year’s report. All fell victim to Forrester’s changing metrics.

— TIBCO remains in the “Strong Performers” category, but Forrester rates its current offering much lower than it rated Alpine and Statistica, which TIBCO acquired last year. This demonstrates the maxim that in business, one plus two doesn’t always add up to three.

— Dataiku scored about the same this year as last.

— Microsoft took a big hit, falling from “Strong Performer” to “Contender,” with markedly lower ratings on both dimensions. Bit of a puzzler, IMHO, the MSFT offering seems better than that.

— MathWorks joins the Wave this year and lands about where you would expect.

— World Programming and Salford Systems trail the pack. SAS has not yet litigated the former out of business. Minitab acquired Salford last year. I can remember using Minitab back in the 1970s. Yeah, I’m that old.

Forrester did not rate Alteryx.

Notebook-Based Predictive Analytics and Machine Learning Solutions

Here’s the “Wave”:

Note: this is the updated “Wave” published by Forrester on September 7.

Most of the vendors in this wave are new to Forrester. My comments:

— Domino Data Lab leads the pack, and rightly so. Domino invented this category and leads in every respect.

— Forrester’s assessment of Oracle as a leader seems, well, aspirational. Customer adoption, per the detailed tables, is zero. Insiders from DataScience.com, which Oracle acquired recently, throw shade at the product’s stability and maturity. Presumably, Oracle has the deep pockets to fix the product and make it work. Even so, it’s not nearly as good as Domino; I’d share a detailed feature/function analysis, but it would take more than a paragraph. Oracle lacks Domino’s street cred with the data science community, and the folks in Oracle Cloud who drove the acquisition don’t talk to the folks in Oracle Data Mining, who have actual customers and experience in the field.

— For this wave, Forrester did not evaluate H2O.ai‘s Driverless AI. Forrester wasn’t impressed with Sparkling Water and Flow UI. Enterprises looking for a notebook-based PaML solution will find better solutions from the other vendors in this evaluation. Ooh, burn.

— My former colleagues at Cloudera should be pleased with their positioning in the middle of the pack. Databricks did well, too. Forrester dings Cloudera and Databricks for using proprietary IDEs instead of Jupyter. I’m sorry, but the folks at Cloudera and Databricks aren’t stupid — they understand that Jupyter isn’t suitable for production software development. Don’t @ me.

— Civis Analytics‘ main asset is its founders’ political connections. Forrester rightly notes that Civis is not yet the platform for everyone, though, as it is currently cloud only, it doesn’t support many machine learning frameworks out of the box, and Spark is still in the pipeline. That’s like saying dinner is ready, but we have no rolls or salad and the roast is still in the oven.

— It must sting Anaconda to score below OpenText, but there it is.

— Google brings up the rear with Cloud DataLab which, according to Forrester, does little to improve data scientist productivity, such as through project capabilities, team collaboration features, and other modeling tools that are important criteria in this evaluation. Yeah, that’s about right.

Your assessment of IBM Watson Studio is incorrect: it’s not based on any services available beforehand and it’s certainly not cobbled together. Development started at square one i.e. no code refactoring but directly coded to run cloud native (Kubernetes).

And indeed IBM Cloud may be not everyone’s favourite, but should a Data Scientist/Company not chose the best product for his needs independent from where it runs? These days people buy services not products. I also haven’t a clue how my 4G works and were it runs. I just happen the have chosen that provider because it fits my need as a consumer.

Note: I work for IBM and I know you have a history with the company. I love your directness, but please also be an independent consultant when it concerns ‘big blue’.

Breaking down IBM Watson Studio:
— IBM previously sold the data prep module as Data Refinery.
— IBM previously sold the data science module as IBM Data Science Experience. (It still markets this module, with some differences, for on-premises implementation.)
— IBM previously sold the Spark and SPSS flows branded as IBM Watson Machine Learning
— The Deep Learning capabilities appear to be a new front end for existing APIs. (There’s nothing wrong with that, of course. UIs add value.)

So, did IBM design the product from scratch and bring modules to market separately? I would believe that argument if there were a coherent UI across the modules and no overlapping functionality. As it is the impression the product leaves with me is exactly as I described — it’s a collection of different services bundled together and rebranded as “Watson.”

Data scientists don’t get to choose a cloud platform in a vacuum. It’s increasingly rare for a data science team to choose a cloud platform that differs from the organization standard. Most data scientists will simply rule out IBM Watson Cloud when they learn it’s only available in IBM Cloud.

FWIW, yes, I worked for IBM in 2011 and 2012, when IBM acquired Netezza. My experience with IBM was entirely positive and I left on my own. If IBM delivered an attractive product, I would say so.

Ah, Netezza, that where the days. One of the most beautiful pieces of technology ever made!

But anyway,
We life in the age of micro-services. It’s not because they ARE separate services and CAN be sold separately (which IBM indeed did), that there is no technical master plan behind it.

The 3 main ml/ai services today that DO share a unified interface are Watson Studio (design tools), Watson Machine Learning (deployment) and Watson Knowledge Catalog (asset management). The latter service was also put in the top right corner of The Forrester Wave: Machine Learning Data Catalogs, Q2 2018. Those 3, together with the GUI’s on top of the Watson API’s all join hands in the Watson Studio UI.

That companies discard the IBM cloud, even if a better service is provided over there, because, according to IT, it’s not their standard, says more about the company than IBM. Disallowing the business side to freely chose their tools means they seem to forget who is making the money.

So, you concede that IBM designed and marketed the components of IBM Watson Studio separately? Good, we have common ground there.

IBM Watson Studio has four main modules: data prep, data science, machine learning, and data catalog. The data prep module (previously branded as Data Refinery) has a consistent UI for all of its functions. So does the Data Catalog.

The Data Science module is a mosh of DSX and SPSS.

The ML+AI module is a mosh of all sorts of things. Some of these have code-driven APIs and others use the workflows ported from SPSS. The UI is neither internally consistent nor consistent with the Data Refinery and Data Catalog UIs.

Arguing that 3 out of some 20 submodules share the same UI is missing the point. In a well-designed product, all of the modules share a common UI.

It’s extremely difficult for IBM to argue that IBM Watson Studio offers benefits to the data scientist that warrant deviating from a company standard. (Which is why IBM resorts to giving it away.) Most of the data science platforms on the market today are available on all three of the leading cloud platforms. Moreover, the native tools available from AWS, Azure, and Google are increasingly competitive. In certain areas, such as deep learning and AI, most data scientists prefer to work with AWS or Google, so there’s no real reason to consider the IBM offering.

AWS nor Google play in the PAML field, IBM is not part of the Notebook wave, it seems Forrester does not see them as competitors. Azure is way down in the PAML wave. Neither AWS, Google, nor Azure are present on the forester wave for machine learning data catalogs.

I’m not working for Forrester, but I’m pretty sure they have there reasons for this.

I don’t know precisely what changed. Mike Gualtieri at Forrester told me that their approach changed from 2017 to 2018. One would have to examine the 2017 report to figure it out, and I simply haven’t had time to do that.

The 50% weighting applies to the Strategy dimension only. Here’s what Forrester says in the report:

Placement on the horizontal axis indicates the strength of the vendors’ strategies. Given the rapidly evolving nature of this segment, we considered the vendor’s solution roadmap to be most important, followed by the ability of its partnerships to propel its growth.