Dataware for data-driven transformation

MapR Data Science Refinery

With the MapR Data Science Refinery, MapR provides businesses with a suite of data science tools to help them distill insights from their data and turn those insights into operational next-gen applications.

MapR has recognized the need for agile, containerized solutions that can scale to fit the needs of all types of data science teams. Within the MapR platform, support is offered for popular open source tooling in a preconfigured offering that can be distributed to many data science teams across a multitenant environment.

The MapR Data Science Refinery is an easy-to-deploy and scalable data science toolkit with native access to all platform assets and superior out-of-the-box security.

The MapR Data Science Refinery offers:

Access to All Platform Assets - The MapR FUSE-based POSIX Client allows app servers, web servers, and other client nodes and apps to read and write data directly and securely to a MapR cluster, like a Linux filesystem. In addition, connectors are provided for interacting with both MapR-DB and MapR-ES via Apache Spark connectors.

Superior Security - The MapR Platform is secure by default, and Apache Zeppelin on MapR leverages and integrates with this security layer using the built-in capabilities provided by the MapR Persistent Application Container (PACC).

The first big data-scale streaming system built into a converged data platform

The only big data streaming system to support global event replication reliably at IoT scale

Create Real-Time Machine Learning Pipelines

A core component of the MapR Platform, MapR-ES is a global publish-subscribe event streaming system for big data. With native integration between MapR-ES and machine learning libraries, organizations can now create real-time machine learning pipelines, allowing them to apply ML models to real-time data.

Increase Data Science Productivity with Broad Language and Library Support

The MapR Data Science Refinery offers the Apache Zeppelin Data Science Notebook to provide the ability to work across many engines in one visual space:

Distributed Compute and ML programming with Apache Spark & Python

Batch and Interactive SQL with Apache Hive and Drill

Scripting support for Apache Pig

Shell access to MapR-FS

Programmatic access to MapR-DB and MapR-ES, using Spark

Easy Deployment with Persistent and Stateful Containers

Easy To Deploy

A Docker image is available on Docker Hub.

Image includes all the necessary bits—no more, no less—required to leverage MapR as a persistent data store for your containerized applications.

Secure

Authentication occurs at a container level to ensure containerized applications only have access to data for which they are authorized.

Communications are encrypted to ensure privacy when accessing data in MapR.

Extensible

It's easy to install Deep Learning libraries to the container or to add further tools to support your specific application needs.

Enable Notebook/Model Collaboration, Sharing, and Mirroring

The MapR Converged Data Platform is ideal for storing model and notebook repositories. Organizations can leverage the MapR Platform’s global namespace and superior replication capability. The MapR Platform also offers immutable snapshots to persist and deploy various versions of the same model, making it possible for data scientists to compare the performance and accuracy of each version of the model.

How Your Business Benefits from the MapR Data Science Refinery

Higher Accuracy for Business Predictions

Machine learning models are only as good as the data they are trained on. With the MapR Data Science Refinery, data scientists get access to all data, which improves the accuracy of the models.

Higher Data Scientist Productivity

MapR Data Science Refinery provides access to a broad range of popular data science tools and libraries, making it easy for data scientists to select the tool of their choice. As a result, data scientists are more productive.

Lower TCO

The MapR Data Science Refinery is easy to deploy and manage. It also provides access to data in-place, removing the need for additional hardware for copying data. As a result, the MapR Data Science Refinery has a lower TCO compared to other data science offerings.

Visualize Your Business

The MapR Data Science Refinery provides pluggable and broad visualization support, helping business leaders and decision makers to visualize the business as it happens.

Refinery Partners

ML is an active area of research and market innovation. There are game-changing ML companies, investing to improve data science productivity as well as build domain-specific machine learning solutions. As a data platform company, we want to be open and give our customers flexibility to use these solutions on the PBs of business data they are relying on MapR to store and manage. MapR has a robust Converged Partner program, and we’re extending this program with selected Refinery partnerships as a holistic approach to enabling the MapR Platform for all types of data science teams.