Welcome to our blog on cognitive data management at DMI. This is intended to become a forum for the community of data managers who are interested in simplifying, streamlining and automating the data management workload through the application of cognitive computing technology.

"Cognitive" sounds so trendy. What "cognitive" is varies depending on who you ask.

In some cases, cognitive computing is metaphorical. It refers to a fairly common software engine that simply executes predefined instructions written in any number of scripting or programming languages.

In other cases, cognitive computing refers to the application of algorithms to data in to discern and respond to recognizable patterns.

In still other cases, cognitive refers to machine learning: a set of sophisticated programs that evaluate collected data, compare them to data management policies (criteria, standards, etc.) and determine what if any actions to take.

This blog provides a location to learn more about the theory of CDM and the capabilities of the current generation of vendor products portending to provide cognitive data management services. Ultimately, we agree that the volume of data that is amassing in most organizations already exceeds the capability of human administrators to manage; automated tools are needed to support the effort.

Let's learn more about CDM and share our experiences with data management generally using this forum.

Some view it as a practice aimed at preserving data for appropriate periods of time (as defined by business and/or regulatory requirements) and in a resilient and cost-efficient way.

Others see it as a huge bother. They note the absence of uniform technology standards or agreement between platform vendors, the infighting between vendors of different storage media types (and their paid analyst mouthpieces), the conflation of archive concepts and processes with those of data backup and data protection, and a myriad of other issues that make doing archiving no fun at all.

The purpose of this blog and the community that it supports is to improve the general appreciation of the value of archiving and to evaluate some of the component technologies that are useful to archivists who are interested in managing data throughout its lifecycle.

Archive should be a retention pool for data that isn't accessed or modified very frequently. In addition to providing a reasonably-priced location for storing inactive bits, an archive should also provide the means for rapid search and retrieval of data as well as appropriate data protection and security/privacy services.

Archiving requires professional discipline if it is to be done correctly. From the decisions regarding what types of technologies to use to create the archival platform, to the choices around who should define policies for data preservation, to the best practices for managing archives over time, controlling access to data, limiting data editing, and many other issues, archive is not simple or easy. Vendors claiming to offer an out-of-the-box plug-and-play solution are pulling our collective leg.

Still, given the current rates of data growth and the on-going evolution of standards on data stewearship, data preservation and archiving best practices need to be defined and vendor solutions need to be evaluated. Here is where the community of interest in archive at DMI will take on the challenge. Please offer your insights and comment (constructively) on the insights posted by others. Together, we can improve the collective wisdom around archiving.

Welcome to the Storage Technology and Storage Management Blog at DMI. This is the centerpiece of a community dedicated to the discussion of all things data storage.

We are hoping to use this space to discuss storage technology and architecture. We also plan to review specific products and services that are being delivered to market by the storage industry, evaluating the business value and actual performance that they deliver over time.

We are also passionate about keeping the record straight and separating the marketecture from the architecture in the discussion of storage itself. Here are a few examples:

Storage is part of the original von Neumann machine design (an early architecture for computers). Storage is often conflated with memory, which is a kind of storage but not one designed for the same purpose. Memories were originally part of the central processing system, providing a temporary workspace or scratchpad where CPUs could temporarily store and access data being used by application workload. The actual storage components were used to store data that would serve as inputs to applications and outputs from applications. So, the von Neumann machine is being relegated to the dustbins of history as venodrs push DRAM and NVRAM based storage products. Those who say storage is dead, killed by NVMe or other memory-based storage approaches, are missing this distinction.

Storage currently comes in the form of paper, magnetic, optical and silicon devices. Some of us are old enough to remember punchcards and punch tape. These media gave rise to magnetic tape, then hard disk drives of many types, then optical and silcon based random-access media. Readers will note that we do not count "cloud" as a type of storage media. Clouds are shorthand for a service delivery model. Cloud vendors use the same storage media as everyone else; clouds themselves are not a form of storage. Cloud is a service delivery model -- and an incomplete one, at that (according to the National Institute of Standards and Technology). Listing cloud as a form of storage is incorrect.

Software-defined storage is the "latest thing" -- at least in vendor marketing. Truth be told, we "old timers" were doing software-defined storage on mainframes in 1993, using IBM's System Managed Storage facility (SMS). All storage is software-defined, when you get right down to it. With monolithic storage arrays, the software resides on the array controller. With SMS and current software-defined storage stacks, the functionality lives on a server and is parsed out to commodity storage kit. There is no right or wrong approach, but functionality should be provided where it makes sense to do so and where you can spare the cycles to perform the work. The sad truth is that we have not seen a thorough discussion of the appropriate place to host individual storage functions. Clearly, some functionality should probably be placed close to the storage devices on which it is acting, but some storage services can be hosted virtually anywhere.

The above point leads to another. It is generally not a good thing to be locked in to any particular vendor's technology as this limits options for solving technical problems or for deriving business advantage from technology generally. There is a misperception that moving from proprietary storage hardware to commodity hardware with proprietary software-defined storage software eliminates lock-in. If this is true, why can't you store data from your VMware environment on a different hypervisor vendor's storage stack? A VMware VSAN and a Microsoft Hyper-V storage spaces environment are both software-defined storage platforms, but you cannot place Hyper-V workload data on the VSAN or vice versa because each vendor wishes to be the dominant hypervisor in your environment. How is this different from, say, the tricks that EMC used to prevent interoperability between their kit and those of competitors, all of whom were like EMC selling boxes of Seagate hard disks?

These are just a few examples of the misapprehensions about storage, cultivated by certain vendor marketing departments and their paid henchmen in the PR and industry analyst communities. We want to use this community to keep things straight so that intelligent decision-making can happen.

Our biggest concern, frankly, isn't whose gear wins the day in the marketplace, or whose software you elect to use. Our biggest concern is that the lack of intelligent debate and discussion is preventing us from advancing the ball, from deploying and using the right storage technology in the best possible way to solve our critical data hosting challenges.

It is interesting that the very vendors who dismissed software-defined storage when DataCore Software and a handful of others championed it in the late 1990s are now the big evangelists. Yet, they still have yet to embrace storage virtualization as part of the software-defined storage stack. Instead, they dismiss it out of hand because such a beastie would prevent them from using software-defined storage to lock-in their customers and lock out their competition just as the hardware vendors they delight in villanizing did before them.

This isn't just philosophical. There is growing evidence that silo'ing storage behind proprietary hypervisor-controlled SDS stacks is leading to the decline of capacity allocation efficiency on an infrastructure-wide basis. This story is underreported, or is certainly dwarfed by heavy coverage of vendor-sponsored reports portraying SDS as a godsend in terms of storage cost of ownership reduction. While it may be true that managing a storage silo with a set of tools provided by the hypervisor vendor enables greater efficiency and control of the silo'ed storage, it also has the impact of preventing sane and rational management of capacity amongst and between storage silos created by different hypervisor vendors.

There are no easy answers to some of these issues. But we want to peel the onion in this blog and hopefully to define some best practices that will benefit our community. Welcome to the party!