A little over a decade ago I had the great pleasure of hearing a commencement speech for my son. Eric Lander, the leader of the human genome project, described his journey through science and life. He shared that he did not have a clear direction as a mathematics major at Princeton, and the labyrinth of decisions that followed to get him into genetic biology. He also shared the process of science and its adoption into the culture and economy of the modern-day world. His punchline was that it takes a generation to understand and incorporate scientific discoveries into the economy and culture, as my father had once told me.

I won big at a recent “casino night” event by betting all my chips and hitting blackjack on the last hand. After lots of adulation from my peers for my courage, and a small prize (we weren’t playing for money), they asked me why I risked the bet: “There was nothing at stake,” I replied.

The same isn’t true for large businesses planning their migration to the cloud. The promise of on-demand capacity, low-cost storage, and a rich ecosystem of open-source and commercial tools are compelling. But the stakes are real, especially when it comes to migrating data. As hundreds of companies have now demonstrated, a single data breach can cause long-term economic, legal, and brand damage. Beyond data protection, simply managing data in the cloud is different, and if it’s not done right the cost, complexity, and risk can bring down the house.

Why would a bank acquire an AI software company? Last week, TDBank announced the acquisition of Layer 6, an AI startup. [Full disclosure: TDBank uses our software to manage and provision its enterprise data.] The rapid emergence of finserv startups is putting pressure on established institutions, leveraging the speed, scale, and cost of an all-digital infrastructure. Most important, these startups are replacing human-intensive processes with a fully-automated data and analytics ecosystem that delivers the competitive products and on-demand services that millennials want.

For decades, I have seen corporate data strategy swing back and forth like a digital pendulum. Centralize – decentralize. Consolidate – federate. Inmon - Kimball. ERP – EUC (end user computing/desktop). Master data management – analytical sandboxes. Data center – cloud. Each swing is a multi-year, multi-million-dollar migration, after which the limits of the new approach often drive the company to reverse direction again.

The forces driving the pendulum are two very real, and seemingly conflicting, business needs. On one hand, businesses need data agility to respond to rapidly evolving opportunities and threats. On the other hand, large enterprises need data to scale to deliver secure, consistent, high-quality information to automated systems and business teams. These needs have traditionally been seen as a choice: agility or scale—pick one. This is because the traditional tools and methodologies to deliver agility or scale are very different.

]]>https://www.cio.com/article/3243565/data-agility-or-scale-a-false-choice.html
Data and the human factor: are you shackling your best and brightest?Mon, 30 Oct 2017 06:28:00 -0700Paul BarthPaul Barth

I had the pleasure of hearing one of our customers, an analytics leader at a large pharmaceutical company, present their achievements in the emerging field of Real World Evidence (RWE) last month. In contrast to controlled clinical trials, RWE looks to measure and predict health outcomes and understand underlying factors using data about everyday patients and their environment—i.e., the real world. As you would expect, this is an incredibly challenging task: creating a valid cohort from millions of patients, capturing and aligning electronic medical records and claims, and accounting for exogenous factors such as the weather and economic conditions, to name a few.

When compared to data warehousing, the data lake paradigm is incredibly appealing. Load all the raw data, model just what you need, when you need it, check the quality and content on the fly, and voilà! Cut the bureaucratic red tape and get to serving up the answers the business demands.

While the new approach has delivered real value, many data lakes have a critical blind spot: exposing sensitive data. Until you start working with the data, you don’t know what you don’t know. And that can be fatal.

Take personally identifiable information (PII), such as Tax IDs, email addresses, and credit card numbers. Many industries have established regulations on how to protect this information to safeguard their customers (and themselves) from fraud. Elaborate encryption and obfuscation methods have been developed to hide sensitive information yet still enable the business to use it for automated processes and analytic insights.

MIT is not regarded as a football powerhouse, but the play-by-play at last month’s gathering of global chief data officers sounded like the Big 10. “We’re moving from defense to offense,” said Joan dal Bianco, TD Bank’s U.S. CDO. “After years of focusing on regulatory and compliance reporting, we are pivoting to use our data assets for innovation.” I’ve presented at the MIT CDOIQ conference over its 11-year history, and the change in strategy was palpable.

For the first time in seven years, all 35 banks passed the Fed’s Comprehensive Capital Analysis Review (CCAR). In the past, many banks struggled with the regulation — not because of how they managed capital, but with proving the accuracy and provenance of the data they reported. This led most banks to create the CDO role, along with massive investments in data management infrastructure.

Last week, I had the pleasure of meeting with the Podium Data Executive Advisory Council (EAC). The EAC is comprised of a group of senior executives and advisors from TD Bank, Cigna, the International Institute for Analytics (IIA) and several others. These companies are on the leading edge of transforming their businesses with agile analytics and self-serve data marketplaces. One attendee described the group as “trailblazers”; I could not agree more.

Yet they are not trailblazers because they are leading their peers in accelerating analytics and democratizing data (which they are), or that they are successfully putting big data technology into production (which many firms are not.) They are trailblazers because they have developed a completely new way to turn their data into a high-value business asset, which is incredibly disruptive to the status quo. They are in truly unchartered territory – where traditional organizational boundaries are being breached, long-standing customs and practices are changing, and well-accepted roles and responsibilities are getting redefined.

For decades, data scientists (née statisticians) have had sandboxes to explore data and find valuable insights. In what seemed like a happy compromise, analysts could quickly load, manipulate, and combine enterprise and industry data in search of new insights and predictions without worry that they would compromise sensitive data or production workflows. While this accelerated creating new insights, putting them into production was a nightmare. A bevy of custom code and data created in an ungoverned environment needed to be converted, quality controlled, and optimized before deployment. It often took the better part of a year for a business to get value from an insight gleaned in a few weeks.

It’s heady times for data. Big data, data lakes, data-as-a-service, data breaches—a week can’t go by without a headline mentioning data. Certainly, businesses are aggressively investing in harnessing their data as an asset to drive strategic insights, automate complex processes, and personalize customer experiences. Companies are doubling down on data security, and at the same time starting to package insights as information products.

All this energy is turning up the pressure on data management in these organizations, and cracks are rapidly appearing. The status quo of highly engineered data warehouse supply chains surrounded by pockets of specialized analytical sandboxes can’t keep up with business demand for data. Tensions are rising between groups tasked with locking data down with those trying to set it free.