Search form

5 key data predictions for 2018

Emergence of decentralized immutable mechanisms for managing data is one of the key predictions highlighted by Mark Bregman, CTO, NetApp

Mark Bregman, CTO, NetApp outlines 5 key CTO predictions for 2018.

1. Data becomes self-aware

Today, we have processes that act on data and determine how it’s moved, managed and protected. But what if the data defined itself instead?

As data becomes self-aware and even more diverse than it is today, the metadata will make it possible for the data to proactively transport, categorize, analyze and protect itself. The flow between data, applications and storage elements will be mapped in real time as the data delivers the exact information a user needs at the exact time they need it. This also introduces the ability for data to self-govern. The data itself will determine who has the right to access, share and use it, which could have wider implications for external data protection, privacy, governance and sovereignty.

For example, if you are in a car accident there may be a number of different groups that want or demand access to the data from your car. A judge or insurance company may need it to determine liability, while an auto manufacturer may want it to optimize the performance of the brakes or other mechanical systems. When data is self-aware, it can be tagged so it controls who sees what parts of it and when, without additional time consuming and potentially error prone human intervention to subdivide, approve and disseminate the valuable data.

2. Virtual machines become “rideshare” machines

It will be faster, cheaper and more convenient to manage increasingly distributed data using virtual machines, provisioned on webscale infrastructure, than it will be on real machines.

This can be thought of in terms of buying a car versus leasing one or using a rideshare service like Uber or Lyft. If you are someone that hauls heavy loads every day, it would make sense for you to buy a truck. However, someone else may only need a certain kind of vehicle for a set period of time, making it more practical to lease. And then, there are those who only need a vehicle to get them from point A to point B, one time only: the type of vehicle doesn’t matter, just speed and convenience, so a rideshare service the best option.

This same thinking applies in the context of virtual versus physical machine instances. Custom hardware can be expensive, but for consistent, intensive workloads, it might make more sense to invest in the physical infrastructure. A virtual machine instance in the cloud supporting variable workloads would be like leasing: users can access the virtual machine without owning it or needing to know any details about it. And, at the end of the “lease,” it’s gone. Virtual machines provisioned on webscale infrastructure (that is, serverless computing) are like the rideshare service of computing where the user simply specifies the task that needs to be done. They leave the rest of the details for the cloud provider to sort out, making it more convenient and easier to use than traditional models for certain types of workloads.

3. Data will grow faster than the ability to transport it...and that’s ok!

It’s no secret that data has become incredibly dynamic and is being generated at an unprecedented rate that will greatly exceed the ability to transport it. However, instead of moving the data, the applications and resources needed to process it will be moved to the data and that has implications for new architectures like edge, core, and cloud. In the future, the amount of data ingested in the core will always be less than the amount generated at the edge, but this won’t happen by accident. It must be enabled very deliberately to ensure that the right data is being retained for later decision making.

For example, autonomous car manufacturers are adding sensors that will generate so much data that there's no network fast enough between the car and data centers to move it. Historically, devices at the edge haven’t created a lot of data, but now with sensors in everything from cars to thermostats to wearables, edge data is growing so fast it will exceed the capacity of the network connections to the core. Autonomous cars and other edge devices require real-time analysis at the edge in order to make critical in-the-moment decisions. As a result, we will move the applications to the data.

As the demand to analyze enormous sets of data ever more rapidly increases, we need to move the data closer to the compute resource. Persistent memory is what will allow ultra-low latency computing without data loss; and these latency demands will finally force software architectures to change and create new data driven opportunities for businesses. Flash technology has been a hot topic in the industry, however, the software being run on it didn’t really change, it just got faster.

This is being driven by the evolution of IT’s role in an organization. In the past, IT’s primary function would have been to automate and optimize processes like ordering, billing, accounts receivable and others. Today, IT is integral to enriching customer relationships by offering always-on services, mobile apps and rich web experiences. The next step will be to monetize the data being collected through various sensors and devices to create new business opportunities and it’s this step that will require new application architectures supported by technology like persistent memory.

5. Emergence of decentralized immutable mechanisms for managing data

Mechanisms to manage data in a trustworthy, immutable and truly distributed way (meaning no central authority) will emerge and have a profound impact on the datacenter. Blockchain is a prime example of this.

Decentralized mechanisms like blockchain challenge the traditional sense of data protection and management. Because there is no central point of control, such as a centralized server, it is impossible to change or delete information contained on a blockchain and all transactions are irreversible.

Think of it as a biological system. You have a host of small organisms and they each know what they're supposed to do without having to communicate with anything else or be told what to do. Then you throw in a bunch of nutrients: in this case, data. The nutrients know what to do and it all starts operating in a cooperative manner, without any central control. Like a coral reef.

Current datacenters and applications operate like commercially managed farms, with a central point of control (the farmer) managing the surrounding environment. The decentralized immutable mechanisms for managing data will offer microservices that the data can use to perform necessary functions. The microservices and data will work cooperatively, without overall centrally managed control.