SAS Shifts Data Prep Into High Gear

SAS has launched a self-service data prep tool that enables analysts to blend, shape and cleanse data stored in a variety of systems in real-time.

The less time spent on data prep generally results in there being more time to analyze it. With that simple goal in mind, SAS Institute has launched a self-service data preparation tool that enables analysts to blend, shape and cleanse data stored in a variety of systems in real-time, including relational databases, Apache Hadoop, SAS data sets, CSV files and social media streams.

SAS Data Preparation is a cloud-based service specifically designed to enable analysts to, for example, profile, standardize, parse and match data without any intervention on the part of internal IT department required, says Ron Agresta, director, project management at SAS. That capability will enable analysts and so-called citizen data scientists to identify errors before data gets fed into the pipeline of analytics applications, adds Agresta.

Based on an in-memory analytics engine that SAS makes available via a cloud service, Agresta says the goal is to automate as much of the data preparation process as possible.

“It’s all about making faster, better decisions,” says Agresta.

The SAS Data Preparation service is the latest in a series of offerings that is transforming the role of IT in analytics.

Historically, many IT organizations were called upon to set up both the data warehouse and run reports usually at regularly scheduled analytics. Modern analytics applications now enable end-user to prepare and interrogate data on their own without any help from the IT staff. The IT department is still needed to set up the data analytics platforms. But the days when IT organizations needed to allocate staff to execute queries on behalf of end users are coming to an end.

Easing data prep makes digital transformation easier

In fact, that new capability is having a profound impact on digital business transformation initiatives as organizations attempt to eliminate the lag time between when an event takes place and the organization reacts. Reducing the time and effort it takes to prepare data for analysis makes it possible for data scientists to feed more data sooner into the various data pipelines and models being created to drive digital business applications. Data scientists and analysts simply no longer have the time or patience to wait for IT staff to execute a query on their behalf.

Agresta notes that the amount of data being fed into those pipelines is now pushing analytics into the cloud, where the compute resources required to analyze data is comparatively inexpensive to on-premises IT environments. Much of that data is now being streamed directly into a cloud or analyzed locally in real time at, for example, the edge of the network where Internet of Things (IoT) is running, adds Agresta. That eliminates much of the effort that was previously allocated to extract, transform and load (ETL) processes, adds Agresta.

With each passing day, more organizations than ever are realizing the data they collect is a strategic asset. The challenge and opportunity they face is finding ways to turn all that data into actionable intelligence. The first step on that journey almost always involves implementing a consistent set of data preparation processes to make sure the organization doesn’t wind up making more bad, rather than good, decisions faster.