How Zero Copy Data Virtualization Can Help Enterprises Become AI Ready

Molecula is a Data Virtualization Platform that can created instantaneous access to huge, disparate, and geographically distributed datasets to help machine learning and analytical workloads at a much small cost compared to traditional approaches.

There is an unabating amount of growing data distributed across the modern-day business in multiple formats, types, and locations. Every day there is about three quintillion bytes of data generated and according to estimates about 20% of that data is structured and available to processing. Even though enterprises are leveraging AI and advanced analytics to gain insights, there is a bottleneck experienced to access all of that data easily. Processes like batching, indexing, federation, aggregation, sampling and caching create long information request cycles which are the opposite of making real-time critical business decisions. This is the exact reason why 80% of all AI and Analytics projects fail to deliver what was originally planned.

To solve this issue, one startup called Molecula, an Austin, Texas-based data virtualisation startup recently closed a $6 million seed round to bring its AI-accelerating software to the enterprise world. The funding round was led by The Seraph Group, Lontra Ventures, Velar Capital, Capital Factory, Andrew Busey and Jason Dorsey to help Molecula scale its technology, known as “zero-copy data virtualisation,” which makes complex analysis by data from various sources and locations available in real-time through a virtualised access layer. The company was founded in 2017 when it spun out of another data analytics company by the name of Umbil, which is also Austin-based. The company also announced that their clients and partners include some of the largest brands in the media, entertainment, technology and healthcare sectors. Key solutions deployed include real-time customer segmentation, real-time security and fraud detection and accelerating business intelligence and machine learning projects. It is to be noted that the startup was also part of Oracle Global Startup Ecosystem in October of 2018.

Zero-Copy Data Virtualisation: The Innovation

Molecula is a Data Virtualisation Platform that creates instantaneous, secure access to huge, disparate, and geographically distributed datasets to help Machine Learning and analytical workloads at a much small cost compared to traditional approaches. Typically, enterprises make many copies of their data before they acquire insights. In comparison, Molecula’s zero-copy data virtualisation can get users from data to decision without the typical aggregations, federations or other techniques deployed traditionally, and instead gives real-time virtualised access to all data in-memory. Instead of creating copies of data points and values like a typical index, the technology converts those into a knowledge representation of the data and distribute it across multiple machines. This avoids the need of data movement and provides a layer of abstraction above the physical implementation of data, irrespective of the source, how it is formatted and where it is physically located.

Zero-copy data virtualisation can abstract 100% of data across RDBMS, data lakes, data warehouse and event streams. These representations are a fraction of the size, but contain all the information a machine learning system needs to process a request and analyse it. Molecula says its goal is to empower organisations to gain better insights by making all the data AI-Ready by reducing the time it takes to prepare data for machine learning efforts. Virtualised Abstractions can speed up analytics, machine learning and remote decision making, which can be incredibly powerful for gaining insights from Internet of Things (IoT) devices. “We’re a thousand times faster for machine learning models versus the traditional models,” Molecula has reported.

Molecula is built on Pilosa, the open source technology used by over 1,800 organisations globally to make data AI ready. The Pilosa is an open source, high performance bitmap indexing which was created after years of research by Molecula team to tackle problems with massive amounts of high-velocity data that needs indexing and analysis. Pilosa’s open source software has the ability to store a virtual representation of underlying data in memory, thus making it orders of magnitude smaller and incredibly fast. Using a data repository, cloud, and operating system agnostic approach, Pilosa acts as an additional indexing system that can be applied to an existing data repository. Users can have Pilosa index across data repositories to link together disparate data sources and allow them to execute very fast queries against all of them at once, reducing query time from 20 seconds to 20 milliseconds, as reported.

Overview

Reports have revealed that business executives need data faster in order to keep up with customers, competitors, partners and prevent decreasing productivity. It is the reason that enterprise data management market size will grow by over $57 billion during 2019-2023. The prevalent process depends on creating full copies of data files to index, cache and process them, which takes up terabytes of space and is very slow, which is why machine learning models are unable to access the full datasets. Here, a startup like Molecula can gain much traction owing to its innovative open source technology.

Provide your comments below

Vishal Chawla is a senior technology journalist at Analytics India Magazine, and writes on the latest trends in the world of analytics, AI and other digital technologies. Prior to Analytics India Magazine, he was a senior correspondent for IDG ComputerWorld and CIO India. Vishal can be reached at vishal.chawla@analyticsindiamag.com