Greenplum Accelerates Your Digital Transformation

Data is at the center of digital transformation, driving how transformation happens. But data is messy, and it’s everywhere—in the cloud and on-premises, and in di erent types and formats.

Even several years after the introduction of data lake solutions, enterprises continue to struggle with applying analytics to disjointed data types and silos. Integrating structured with unstructured data is a major issue, and traditional enterprise data architectures fail to support real-time insights. Data volume growth puts pressure on infrastructure and resources.

Each server node in Greenplum owns and manages a distinct portion of the overall data. The system automatically distributes data and parallelizes query workloads across all available hardware.

A 2017 Gartner survey suggests it takes an average of 52 days to build a predictive model. Speed of model development is therefore a top concern in choosing a data platform. By embedding machine learning in an MPP platform, Pivotal Greenplum can help analysts and data scientists run more models in less time.

Extend SQL with graph analytics and machine learning
Greenplum supports Apache MADlib, an open-source library of distributed, in-database analytical methods. These are implemented as user-de ned functions that can be invoked with standard SQL—nearly 60 graph, statistical, and machine-learning functions are supported.

Add geospatial and text data for complex use cases
Greenplum also supports PostGIS, a spatial database extension for PostgreSQL that allows geographic information system (GIS) objects to be stored and processed in the database. Pivotal GPText, based on Apache SolrCloud, enables the processing of raw text data (including email and social media feeds) with an easy-to-use SQL interface.

Support for Python and R analytical libraries through procedural language extensions (PL/X)
Greenplum allows users to write user-de ned functions (uDF) in a wide range of languages including SQL, Perl, Python, R, C, and Java, and supports distributed execution of uDFs. Furthermore, Greenplum users can leverage functions from any of the add-on packages of these languages (e.g., TensorFlow for Python, rstan for R) in their uDFs. Greenplum 5 also provides easy-to-use installers for the most popular add-on libraries for Python and R.

Run your analytics anywhere you need them. Pivotal Greenplum is a portable, 100% infrastructure-agnostic software solution. Deploy on bare-metal servers, on private cloud (both openStack and vMWare vSphere are supported), and on public IaaS (AWS, Azure, and now on the Google Cloud Platform). ubuntu users can use native commands to install Greenplum with ease from the Personal Package Archive that contains the compiled releases.

CONNECT TO HADOOP AND PUBLIC CLOUD REPOSITORIES

Using external tables, Pivotal Greenplum can query data that is natively stored in AWS S3, along with data stored in the Greenplum cluster. This means that a single analytical query can be segmented and distributed to several environments.

For users who have (or are considering) a data lake, the Platform eXtension Framework (PXF) combines the cost and storage advantages of the data lake with the performance of the Greenplum MPP query engine. With PXF, Greenplum users can federate queries across internal tables and external Hadoop sources, such as HDFS, HBase, and Hive. PXF is a REST API abstraction layer that enables Pivotal Greenplum to query Hadoop data in
a highly parallel way. It also includes a plugin for JSoN les, and users can create custom connectors to access other data stores, processing engines, or le and storage formats via framework APIs.

STABILITY AND SCALABILITY WITH NEW CONTAINERIZATION FEATURES

To provide enhanced resource isolation and elasticity for multitenant and mixed loads, Greenplum now provides containerization features for SQL and trusted languages.

SQL containerization
Greenplum Resource Groups provide resource isolation for query multi-tenancy and mixed workloads. SQL containerization groups together CPu and memory resources—along with concurrent transactions—to ensure each is guaranteed a predetermined amount. Resource groups implement transaction-based concurrency management. This allows for the level
of concurrency to be managed by the DBA, and it creates an orderly queue for queries waiting to enter the system.

Trusted language containerization
PL/Container is an implementation of a trusted language execution engine capable of bringing up Docker containers to isolate the execution of PL/R and PL/Python from a Greenplum database host. The server-side code running inside Greenplum communicates with the container using an RPC protocol.

SUMMARY
Greenplum is an open-source data analytics platform that provides powerful and rapid analytics on very large volumes of data. uniquely geared toward machine learning and advanced data science, Greenplum delivers unmatched analytical query performance on large data volumes and tight integration with leading analytical libraries and software stacks. Additional details on Greenplum can be found in the product and documentation pages. An open-source version of Greenplum (Greenplum Database) is also available for download at greenplum.org.

Previous

Pivotal Cloud Foundry: Continuously Deliver Any App to Every Major Cloud with a Single Platform