Vamsi Chemitiganti's weekly musings on applying Big Data, Cloud, & Middleware technology to solving industry challenges. Published every Friday or Sunday (if I'm very busy). All opinions are entirely my own. I write this blog so my readers don't have to spend money on expensive consultants.

The Big Data Landscape – My Predictions for 2018…

In 2018 we are rapidly entering what I would like to call ‘Big Data 3.0’. This is the age of ‘Converged Big Data’ where its various complementary technologies – Data Science, DevOps, Business Automation begin to all come together to solve complex industry challenges in areas as diverse as Manufacturing, Insurance, IoT, Smart Cities and Banking.

(Image Credit – Simplilearn)

First, we had Big Data 1.0…

In the first pass of Big Data era, Hadoop was the low-cost storage solution. Companies saved tens of billions of dollars from costly and inflexible enterprise data warehouse (EDW) projects. Nearly every large organization has begun deploying Hadoop as an Enterprise Landing Zone (ELZ) to augment an EDW. The early corporate movers working with the leading vendors more or less figured out the kinks in the technology as applied to their business challenges.

Trend #1 Big Data 3.0 – where Data fuels Digital Transformation…

Fortune 5000 process large amounts of customer information daily. This is especially true in areas touched by IoT – power and utilities, manufacturing and connected car. However, they have been sorely lacking in their capacity to interpret this in a form that is meaningful to their customers and their business. In areas such as Banking & Insurance, this can greatly help arrive at a real-time understanding of not just the risks posed by a customer/partner relationship (from a credit risk/AML standpoint) but also an ability to increase the returns per client relationship. Digital Transformation can only be fueled by data assets. In 2018, more companies will tie these trends together moving projects from POC to production.

I have written extensively about efforts to infuse business processes with machine learning. Predictive analytics have typically resembled a line of business project or initiative. The benefits of the learning from localized application initiatives are largely lost to the larger organization if one doesn’t allow multiple applications and business initiatives to access the models built. In 2018, machine learning expands across more usecases from the mundane (fraud detection, customer churn prediction to customer journey) to the new age (virtual reality, conversational interfaces, chatbots, customer behavior analysis, video/facial recognition) etc. Demand for data scientists will increase.

In areas around Industrie 4.0, Oil & Energy, Utilities – billions of endpoints will send data over to edge nodes and backend AI services which will lead to better business planning, real-time decisions and a higher degree of automation & efficiency across a range of processes. The underpinning data capability around these will be a Data Lake.

This is an area both Big Data and AI have begun to influence in a huge way. 2018 will be the year in which every large and medium-sized company will have an AI strategy built on Big Data techniques. Companies will begin exposing their AI models over the cloud using APIs as shown above using a Models as a Service architecture.

Infrastructure vendors have been aiming to first augment and then replace EDW systems. As the ability of projects that perform SQL-on-Hadoop, data governance and audit matures, Hadoop will slowly begin replacing EDW footprint. The key capabilities that Data Lakes usually lack from an EDW standpoint – around OLAP, performance reporting will be augmented by niche technology partners. While this is a change that will easily take years, 2018 is when it begins. Expect migrations where clients have not really been using the full power of EDWs beyond simple relational schemas and log data etc to be the first candidates for this migration.

Trend #4 Cybersecurity pivots into Big Data…

Big Data is now the standard by which forward-looking companies will perform their Cybersecurity and threat modeling. Let us take an example to understand what this means from an industry standpoint. For instance, in Banking, in addition to general network level security, we can categorize business level security considerations into four specific buckets – general fraud, credit card fraud, AML compliance, and cybersecurity. The current best practice in the banking industry is to encourage a certain amount of convergence in the back-end data silos/infrastructure across all of the fraud types – literally in the tens. Forward-looking enterprises are now building cybersecurity data lakes to aggregate & consolidate all digital banking information, wire data, payment data, credit card swipes, other telemetry data (ATM & POS) etc in one place to do security analytics. This pivot to a Data Lake & Big Data can pay off in a big way.

The reason this convergence is helpful is that across all of these different fraud types, the common thread is that the fraud is increasingly digital (or internet based) and they fraudster rings are becoming more sophisticated every day. To detect these infinitesimally small patterns, an analytic approach beyond the existing rules-based approach is key to understand for instance – location-based patterns in terms of where transactions took place, Social Graph-based patterns and Patterns which can commingle real-time & historical data to derive insights. This capability is only possible via a Big Data-enabled stack.

Trend #5 Regulators Demand Big Data – PSD2,GPDR et al…

The common thread across virtually a range of business processes in verticals such as Banking, Insurance, and Retail is the fact that they are regulated by a national or supranational authority. In Banking, across the front, mid and back office, processes ranging from risk data aggregation/reporting, customer onboarding, loan approvals, financial crimes compliance (AML, KYC, CRS & FATCA), enterprise financial reporting& Cyber Security etc – all need to produce verifiable, high fidelity and auditable reports. Regulators have woken up to the fact that all of these areas can benefit from universal access to accurate, cleansed and well-governed cross-organization data from a range of Book Of Record systems.

Further, applying techniques for data processing such as in-memory processing, the process of scenario analysis, computing, & reporting on this data (reg reports/risk scorecards/dashboards etc) can be vastly enhanced. They can be made more real time in response to data about using market movements to understand granular risk concentrations. Finally, model management techniques can be clearly defined and standardized across a large organization. RegTechs or startups focused on the risk and compliance space are already leveraging these techniques across a host of areas identified above.

Trend #6 Data Monetization begins to take off…

The simplest and easiest way to monetize data is to begin collecting disparate data generated during the course of regular operations. An example in Retail Banking is to collect data on customer branch visits, online banking usage logs, clickstreams etc. Once collected, the newer data needs to be fused with existing Book of Record Transaction (BORT) data to then obtain added intelligence on branch utilization, branch design & optimization, customer service improvements etc. It is very important to ensure that the right business metrics are agreed upon and tracked across the monetization journey. Expect Data Monetization projects to take off in 2018 with verticals like Telecom, Banking, and Insurance to take the lead on these initiatives.

Most Cloud Native Architectures are designed in response to Digital Business initiatives – where it is important to personalize and to track minute customer interactions. The main components of a Cloud Native Platform are shown below and the vast majority of these leverage a microservices based design. Given all this, it is important to note that a Big Data stack based on Hadoop (Gen 2) is not just a data processing platform. It has multiple personas – a real-time, streaming data, interactive platform that can perform any kind of data processing (batch, analytical, in memory & graph based) while providing search, messaging & governance capabilities. Thus, Hadoop provides not just massive data storage capabilities but also provides multiple frameworks to process the data resulting in response times of milliseconds with the utmost reliability whether that be real-time data or historical processing of backend data. My bet on 2018 is that these capabilities will increasingly be harnessed as part of a DevOps process to develop a microservices based deployment.

Conclusion…

Big Data will continue to expand exponentially across global businesses in 2018. As with most disruptive innovation, it will also create layers of complexity and opportunity for Enterprise IT. Whatever be the kind of business model – tracking user behavior or location sensitive pricing or business process automation etc – the end goal of IT architecture should be to create enterprise business applications that are heavily data insight and analytics-driven.