There is no question that managing Big Data can be a challenging task.

In the below interview, Tapan Patel from SAS offers expert advice for organizations looking to effectively manage their structured and unstructured data to maximize their business value and see positive results.

Q. What key changes have you noticed over the last ten years within data management and analytics?

A. Significant market, customer and technology changes have taken place over the last 10 years. Data management, analytics and business intelligence technologies, or a combination of those, were not even in the top 10 CIO priority lists in surveys conducted 10 years ago. Economic downturn or not, organizations are constant pressured to focus on better decision making and to use analytics to reduce costs or increase top-line revenue. Hence it came as no surprise that investments in business analytics continue and these technologies are regularly included as top 10 priorities for CIOs.

Even though the business analytics market has matured over the years, many trends -– mobile devices, social, software-as-a-service (SaaS), cloud, open source, hardware improvements, consumerization, and big data — have the potential to disrupt it.

In order to support a variety of BI and analytic scenarios and workloads, the selection of integration technologies has gone beyond traditional Extract-Transform-Load (ETL) to include master data management, change data capture, data federation and other issues.

As data volumes have been steadily increasing over the last decade, interest in data variety such as unstructured and velocity in how fast it is produced, have put focus on how to parse, transform and analyze the new data streams and combine it with traditional ones such as structured data for more value. There is also more emphasis on how to treat time-sensitive, interactive and context “aware” data.

Data quality will gain more prominence in big data evaluations. The approach might depend on where the data came from, how it will be used, if every record needs to be cleansed so that it might affect analytical value and the nature of the application. Is the application mission critical, governance or customer oriented, for example.

New computing models, economics and demand for higher scalability will shift data management and analytic processing to data residing in main memory rather than a physical disk drive. It will make data latency negligible and thus suitable for big data.

Many customer and technology trends have also affected the analytics market segment, such as:

- The industry is undergoing a gradual shift from a centralized to a self-service model

- Analytics are becoming more integrated with operational systems to make accurate decisions

- More analytical insights will be pushed out on mobile devices and platforms

- Analytical capabilities are designed to be more approachable for business users

- The ability to explore and visualize data without preconceived notions, questions or pre-built hierarchies is now a key requirement

- Cloud-based BI and analytics, while still immature, are slowly gaining interest and implementations are more common

- There is a shift from reactive and descriptive BI capabilities towards more proactive predictive analytic capabilities.

Q. What advice can you offer to businesses trying to manage the increased speed, volume, and variety of data that Big Data brings to the table?

A.It is important to understand that any type of organization can run into big data problems, albeit at varying degrees of scale. Secondly, amid all of the hype, the primary focus should be on properly defining the problems and metrics specific to the business unit that big data will help. Organizations should take an incremental approach to data management, analytics and infrastructure needs for big data. Start with a well-defined use case, which meets business unit goals, delivers return on investment and uses appropriate infrastructure elements.

Of course, customers should closely look at data integration challenges to avoid poor analytical outcomes. One of the unique capabilities SAS offers is applying analytics to the data preparation process itself. In case of big data it is not always feasible to “store then score.” In other words, store data to disk before the analysis is done. In addition to this widely used pattern, SAS supports a “stream it, score it and store it” approach that leverages analytics as part of the data preparation stage. This approach leads to real-time analytics and can also be used to help make decisions on how to treat the data with it coming at a rapid pace.

Big data will force businesses to have a close look at the analytics infrastructure. SAS offers a Business Analytics Maturity Assessment service to profile current system and asset usage, identify information infrastructure and architecture design gaps, workload characterization, and the desired information delivery environment. It also includes short-, intermediate- and long-term road maps focusing on analytics deployment and infrastructure options that will provide flexibility and scalability to meet current and future big data requirements.

Organizations will have to justify investments in big data analytics projects and avoid project, technology and business risks. SAS’ leadership in big data analytics offerings, industry domain expertise, professional services m and technical support helps to guarantee success in a cost effective manner. We continue to focus on high-performance analytics solutions to solve complex analytical problems associated with big data.

Finally, organizations will have to assess their competencies, skills and training gaps as they evaluate big data opportunities. Some new skills to look out for will be management of unstructured data, business domain expertise, new analytics techniques and integrating analytics with operational systems.

Q. How can organizations begin to effectively manage their structured and unstructured data to maximize their business value and see the most positive results?

A.As organizations seek benefit from improved decision-making by analyzing new data assets or big data sources, they sometimes hit a brick wall. Existing data warehouse architectures may not be flexible enough to serve cleaned, transformed data quickly for diverse analytical and self-service BI needs. Some of the new big data sources including mobile and social platforms have context and location awareness, which helps to understand behaviors at an individual level. The prevalence of distributed data across different functional areas makes data integration laborious, unproductive and time consuming. On top of that, analytical models and metrics needs to be revised in order to reflect changes in the marketplace.

In addition to traditional data consolidation and management approaches offered by data warehouse and/or data marts, a complementary concept of data virtualization is also gaining acceptance in a few organizations. It creates multiple virtual in-memory views of data from several underlying data sources to present data as if it was integrated. When queried by analytical applications and tools, data virtualization software integrates the necessary data ‘on-the-fly’ at run-time to serve up integrated data on demand.

At a broader level, CIOs need to use multiple approaches to tackle big data. A clear starting point is to leverage existing investments. For example, in-database processing can scale up processing of large volumes of structured data and leverage existing database appliance investments. Growth and interest in using unstructured data increases complexity and IT organizations will have to evaluate scalable options like Hadoop for storage and parallelized processing and high-performance analytics architecture using in-memory computing.

Depending on the business need, organizations must then define and choose the most relevant technology architecture. A strong information management platform and consistent data governance practices are also required for integrating structured and unstructured data.

Q. What steps should organizations be taking to help prepare their IT departments for managing and processing Big Data?

A.Big data analytics help organizations to gain a competitive advantage by spurring growth and innovation. IT will have to lead with an “enabling” role in processing, analyzing and managing big data and help the organization to respond quickly to changing business needs. IT needs to design and manage analytics infrastructure to leverage big data with the same degree of rigor that is applied to other operational applications and enterprise architecture.

IT should also be involved in designing and selecting a next-generation architecture that supports the entire analytics lifecycle – data preparation, data exploration, model development, and operationalizing analytics. The role, relevance and suitability of different types of architecture options, such as in-database processing, in-memory analytics, event stream processing, data virtualization and Hadoop, should be a priority for IT. For example, does the architecture support future real-time analytical processing needs or satisfy the diverse user requirements when it comes to data exploration, modeling and scoring?

Big Data analytics should be embedded directly within the operational applications such as credit application, fraud detection, and call center interactions, and IT should be at the forefront in providing the infrastructure. For example, an auto insurer’s claims–handling process will get value if integrated with models predicting likelihood of fraudulent claims. Taking it to the next level, if insights from telematics data – driver’s speed, braking, acceleration, etc. – combined with claims data will help to assign risk profile for each driver, predict likelihood of accidents and customize premiums accordingly.

Another important area where IT will have to put emphasis on is selecting, training and retaining people with traditional and non-traditional skills to tackle big data. IT will need data management professionals who can understand how to integrate unstructured data combined with exposing IT professionals to business issues and have knowledge of the business domains. Non-traditional skills like behavioral scientists, linguists, designers and others will add lot of value in interpreting insights from big data and help decision makers to reach quick conclusions.

Q. How do SAS In-Memory Analytics and Hadoop work together to help organizations use Big Data analytics to solve business challenges faster than before?

A.Technology advancements — such as cheaper memory resources, ability to manage large amounts of data at an affordable price — have created an environment for in-memory computing technology to further evolve.

When large amounts of memory are available in distributed systems, the task is to parallelize the work, take advantage of local resources for loading and transforming the data, perform complex analytical computations and scale it to solve problems of any size. Our approach to high-performance analytics use these principles to solve the depth and breadth of complex analytical problems that a business demands.

The SAS LASR Analytic Server is the centerpiece of the SAS In-Memory Analytics strategy. SAS LASR Analytic Server is an in-memory analytic platform and not an in-memory database providing a secure, multi-user environment for concurrent access to data in memory. It handles data big and small, and can process requests at great speed due to its high-performance, multi-threaded, and grid structure.

SAS LASR Analytic Server supports several distributed data providers, including Hadoop on commodity blade systems and commercial database appliance providers like Teradata and EMC Greenplum. SAS LASR Analytic Server integrates directly with the Hadoop Distributed File System (HDFS) using a SAS file format, which was designed specifically to support in-memory analytics processing. The SAS LASR Analytic Server does not use MapReduce to access and process data in HDFS.

While the SAS file format in HDFS is our preferred route, we also support other file types in HDFS. For example, support for CSV files, will also give users the ability to perform analytics— such as data and text mining, statistical modeling, operations research, and forecasting—on data stored in Hadoop using same SAS Analytics client interfaces.

SAS In-memory Analytics supports two different processing approaches to fit customer’s needs, usage patterns and problems. For example, data is rapidly loaded in-memory (of SAS LASR Analytic Server) and persisted there for as long as the user chooses to perform data exploration, interactive visualization and reporting (using SAS Visual Analytics). In another scenario once data is rapidly loaded in-memory for execution of high-end predictive modeling and text mining tasks and results are available (using SAS High-Performance Analytics Server), the memory is released until a new job request arrives.

Tapan Patel

Global Product Marketing Manager: High-Performance Analytics, SAS

Tapan Patel is Global Product Marketing Manager at SAS. With more than 13 years in the enterprise software market, Patel leads product marketing for Predictive Analytics and Data Mining as well as High-Performance Analytics, In-Memory Analytics and In-Database Processing. He works closely with customers, partners, industry analysts, press and media, and thought leaders to ensure that SAS continues to deliver high-value solutions to meet the customer needs worldwide.

Prior to his role at SAS, Patel worked as a Sr. Product Manager and Market Research Analyst at HAHT Commerce Inc. Prior to that, he worked in Product and Project Management roles at Core Healthcare Limited.

Patel is a graduate of the Jenkins Graduate School of Management at the North Carolina State University, where he concentrated on Information Technology Management and earned an MBA degree.

TOPICS

ITBriefcase brought to you by: Virtual Star MediaCopyright by IT Briefcase - IT Briefcase is a targeted online publication that attracts qualified business and IT professionals who are actively researching business integration solutions. Some of the topics we cover include BI, BPM, Cloud Computing, Data Storage, Health IT and Open Source. A full list of the topics we cover can be found on the right hand side of our website.