Caringo Filefly Breathes New Life Into NAS Storage

The use of network attached storage (NAS) continues to be very popular but success brings challenges. Many enterprises would love to extend and expand their use of NAS but often run into trouble because of unnecessarily high primary storage costs. Storage is bloated in size because of the large number of files that unnecessarily reside on primary storage i.e., that disk storage that serves the first use purposes of newly created or acquired files for active production purposes. That data should be moved to secondary storage i.e., less expensive disk storage, where they can still serve a useful role but that is easier said than done. Which is where Caringo comes in to meet the challenge.

But first we need to understand why so much data can safely be moved to secondary storage.

Secondary Storage Can Be Home to Data That Has Fulfilled Its First Use

Creation of data is the natural role of applications and users and that data also serves useful business purposes. After its initial purpose is served, data may still have some value (for historical reporting, as part of a big data analysis, for reference purposes, for resale (such as a video recording), etc.), but much of it may never be retrieved or used again in any way. And even data which is retrieved typically does not have to have the performance characteristics of data still experiencing first use needs.

A study conducted by the Compliance, Governance, and Oversight Counsel Summit showed that only 25% of data has current business value. A portion of that data can be called active production or “hot” data. In the filer world, those are files that are likely still to be accessed for their original business process. Now, the rest of the data technically can be migrated (i.e., archived) to secondary storage as long as applications and users can transparently access files as if they were still in primary storage.

That is important since some files will need to be retrieved, often unpredictably, over time to serve a secondary business value use following what is called a “long-tail” distribution. Or files might serve as part of an analytics of big data project.

Let’s see how secondary storage can be implemented to create an overall more cost effective NAS solution using Caringo software as an example.

Introducing Caringo FileFly with Caringo Swarm

In essence, Caringo software enables the effective and efficient use of secondary storage for file servers at a fraction of the cost of primary storage. That enables the original NAS system to take on even more responsibility, providing added value to the enterprise. That is important but is not especially easy to do.

Caringo’s Swarm solution marries the concept of software-defined storage to object storage. Object storage is a very effective way of storing data at scale (think Amazon Web Services). Software-defined storage decouples storage controller software from the storage hardware itself, thus enabling the use of low cost commodity storage components without that storage controller overhead. The result is a pool of cost-effective, object-based commodity storage which in a NAS world can be used as secondary storage.

That pool is divided into tiers. Consider primary storage as “hot” storage as files stored there should be those that are likely to require read or write access and high performance. The operative words are “should be.” As has been discussed, a lot of files are on primary storage that do not need to be there. With Caringo, files can be migrated to one of two disk secondary storage tiers. The first is the “warm” tier, which is really an active archive. Warm means that there is a reasonable chance that the files will be accessed at some time. As such these files may have ongoing business value (as noted for revenue, reference, and/or analysis purposes). The other tier is a “cold” tier, which might also be called deep storage or a deep archive, where the thought is that the data will never have to be accessed again, but has to be kept for such reasons as regulatory and other legal compliance requirements.

Migrating Files to a Caringo Swarm Managed Data Lake

However, something has to enable production NAS filers to work with Caringo Swarm to migrate selected data to the Swarm-managed secondary storage pool (or data lake, as some would call it). That something is Caringo FileFly software which runs allows Windows and NetApp filers to work in conjunction with Caringo Swarm. Together, FileFly and Swarm bridge the gap between the NAS protocols, notably CIFS/NFS, or Windows and NetApp filers, and the object world.

Caringo Swarm uses file-level policy automation to migrate files from primary NAS storage arrays to object-based storage systems. Policies may be set on one or a combination of file attributes, including but not limited to: last-time accessed, data created, user, file type, pattern matching on files that includes wildcards. Multiple policies can run simultaneously at any time. Note that this is a much more diverse set of functions than hierarchical storage management (HSM) which was limited to time-based migration of files.

Caringo policy management is much more flexible than HSM. Migration is performed transparently to both end users and NAS administrators. This means that there are no changes to applications or mount points. That capability is critical as the lack of transparency could easily involve extra work that would be unacceptable to all parties and overset Caringo’s cost benefits.

The Tiered Data Lake Provides a Rich File Storage Environment

Caringo Swarm managed object storage environments bring a number of data-lake-related benefits specifically to the storage under its management, some of which are:

Global file repository and access — many filers can store in a single Swarm cluster; one of the many reasons that this is important is that analytic tools can access all the data they need

Searchable — this is a key capability for easily retrieving specific files

Always Protected — Once files are in Swarm they are protected through replication or erasure coding, eliminating the risk of data loss

These are beneficial to the files that are being managed in the data lake, but there are also benefits to the primary storage, such as the smaller amount of data lowering the backup/restore burden.

The Whole Caringo-Enabled NAS Solution Ecosystem Benefits

There are a number of benefits to the overall solution, but three stand out:

Lower Cost — the sticker price is always the red flag in front of the IT budget bull; Caringo claims cost reductions of as much as 4X+ which is significant though IT would have to go through due diligence with Caringo.

Scalability — being able to grow the Caringo environment as needed and to be able to manage that growth effectively (including cost) is essential and has been an increasing challenge to existing NAS installations

Investment protection —not only the physical investment in an existing NAS solution that can now accommodate much more growth, but also the investment in existing applications and minimally disturbing users who do not have to change their daily habits and learn something new.

Mesabi Musings Network-attached storage has proven to be very beneficial for enterprises for handling files. However, success has had its price in that NAS filers are having trouble cost effectively keeping up with the growth of information. Caringo deploys FileFly software on NAS servers to remove files that unnecessarily reside on primary (and thus more expensive) storage and migrate them to more cost-effective object-managed secondary storage. Not only is the Caringo-enabled solution more cost effective but it is also more scalable.

Moreover, the Caringo object storage data lake enables an enterprise to more easily take advantage of any ongoing value of the stored data, such as using rich metadata for analysis purposes. To sum it up, Caringo promises to reinvigorate existing NAS installations, helping them become more productive and valuable.