Whitepaper

Executive Overview

Solid State Drives (SSDs), with their inherent high performance, high capacity and support for enterprise workload acceleration, have become a driving force in the enterprise – and they are viewed as critical to the future of the data center and the storage industry.

Traditionally, companies have populated their data centers with Hard Disk Drives (HDDs) using Storage Area Network (SAN) and Network Attached Storage (NAS) technologies. These HDDs tackle day-to-day processing and storage requirements, and have a complementary place in data centers, alongside higher performance all-flash storage arrays with racks of SSDs.1

However, traditional HDD storage using spinning magnetic media is increasingly proving to be less than ideal for critical business applications that have ultra-high-performance needs. Examples include: time-sensitive Big Data analytics, Virtual Desktop Infrastructure (VDI), and Electronic Discovery (eDiscovery) with large volumes of email.

Latency, performance and the mean time between failures (MTBF) - are all improved with the use of SSDs to do the same tasks. All-flash storage arrays with racks of enterprise-grade SSDs are used for high-performance, high-throughput processing, with near-instantaneous access, and storage of essential business data.

The pervasiveness of cloud computing has only exacerbated the contrast between traditional HDDs and higher-performance flash storage because of the urgent need for people to access, download or analyze vast amounts of information in milliseconds (ms), anytime and anywhere. As a result, SSDs and the all-flash storage arrays in which they reside are increasingly taking on the tasks of the highest- performance HDDs in the data center...”2

The Business Need for Speed

Flash storage is an enterprise game-changer. SSDs offer a powerful combination of highperformance, capacity, endurance and reliability. These attributes are extremely important for certain critical business applications, such as rapid and cost-effect indexing, sorting and storing huge volumes of email and other critical data. Such a real business need proved to be the use case at SanDisk, a global leader in flash-storage solutions.

Implementing an eDiscovery Solution at SanDisk®

SanDisk IT decided to implement an eDiscovery solution to meet the complex requirements from the company’s Legal department, which must comply with Records and Information Management (RIM) Governance, case management and litigation hold requirements. The solution required an innovative use of SanDisk SSD technology deployed in an all-flash storage array.

After careful consideration of all of the IT and business factors, SanDisk decided to deploy an HP Autonomy solution for eDiscovery of all its legal documents – and to install a Kaminario all-flash storage array to retain the digitized data for faster access to that data. SanDisk decided to leverage flash SSDs because they offer a powerful combination of high performance, capacity, endurance and superior reliability.

Before the Deployment

In any company, compliance with retention-of-data regulations is critical because it helps keep information safe, lowers the risk of civil or criminal penalties, saves potentially millions of dollars in unnecessary litigation costs, and reduces storage and backup costs.

This led to the need for SanDisk to establish one email archive global solution to replace EVault and to eliminate local PST file archiving, which would be accessible from the worldwide web and Microsoft Outlook.

To put all of this in perspective, consider that SanDisk employees around the world send and receive more than 500,000 to 1 million emails per day, according to data compiled by SanDisk’s IT organization. To store all hat data, SanDisk IT maintains a corporate email archive that stores over 21 Terabytes (TB) of data. At 25,000 pages per gigabyte (GB), that is more than 500 million pages of documents that reside in SanDisk’s email archive. The cost of searching and reviewing an email archive of that size is prohibitive.

Addressing the Pain-points in the Legacy eDiscovery Solution

These indexing, sorting, and archiving processes are recognized as time-consuming and expensive pain points, which is why SanDisk’s legal department adopted an eDiscovery platform to serve as the company’s single, global archive solution. Legal now will retain emails, plus attachments, with identified business or legal importance in line with SanDisk Record Retention Policies, and litigation hold requirements in the eDiscovery application. The goal is to provide case management and search capabilities for the legal team so that they can effectively manage and contain litigation.

SanDisk is rolling out the HP Autonomy application to provide powerful email search and archiving capabilities to all 5,000-plus employees worldwide. When complete, this effort will allow employees to search, sort, and retain emails in an online global archiving solution, which will put substantially less strain on the IT Exchange infrastructure. This initiative will also improve the performance of employee computers by reducing the amount of local storage required on their machines, and more importantly, provide the legal department with the ability to search, process, and manage more effectively the retention of digital communications.

SanDisk Legal and eDiscovery Economics

When a litigation case is presented at SanDisk, the company’s attorneys first need to scope and size all aspects of the case. Strict deadlines drive the process and case information is needed immediately. Any delays can impede a case and drive legal fees to extraordinarily high levels. The strategy is essentially therefore one of cost avoidance by eliminating irrelevant emails and documents.

A simplified look at eDiscovery economics quickly demonstrates the importance of accurately scoping the parameters of a case and accessing information as quickly as possible. An attorney generally takes 167 hours to read one GB of email (about 20,000 emails).

With this high cost in mind, even a short delay is unacceptable. Imagine a situation in which a pianist practices a piano piece without hearing the sound for 6 to 12 hours. In this example, it would take multiple iterations over an extraordinarily long period of time to learn a piece. This simply would not work. Likewise, in the real world of eDiscovery and litigation case management, the legal team needs to be able to run repeated data searches, refining and reducing the search results for presentation in trial. This demanding real-world application requires high performance storage, capable of significant continuous data ingestion, deduplication, and complex indexing and tagging, with the Tier-1 Service Level Agreements (SLAs) on a 24/7 basis, 365 days a year.

Enter SanDisk IT

In order to implement the HP Autonomy eDiscovery solution, SanDisk’s attorneys turned to SanDisk IT to migrate the data from Microsoft Exchange and the legacy EVault archiving platforms into HP Autonomy. This involved downloading and processing emails and their attachments, and eliminating all duplicates so that only a single copy of an email message is stored. Finally, and more importantly, this involved tagging messages and attachments with metadata and building a powerful index to meet the demanding search requirements of SanDisk legal.

Based on trial runs, SanDisk IT estimated that to load, index and archive these quarter of a billion emails (plus attachments) into traditional HDD storage, the task would take a minimum of 50 days at a processing rate of 2 million emails per day. This was considered an unacceptable outcome because it would jeopardize the release of the eDiscovery capabilities to the legal team, and eventually the release of the powerful email search and archive functionality to the rest of the company.

In a bid to accelerate the timetable, SanDisk’s CIO assigned a team of engineers to evaluate the performance of several all-flash storage arrays that would serve as an alternative to HDDs for this particular ultra-high performance application. The hypothesis that drove this decision was that low latency (<0.5ms) flash storage would virtually eliminate the immense email indexing challenge.

SanDisk Flash Memory and Enterprise Storage: The Data Dilemma

SanDisk IT embarked on lab test evaluations using HP Autonomy software to test all-flash array storage offerings since existing traditional HDD storage had maximized its available capacity. Of all the systems tested, an all-flash array storage solution from Kaminario, a flash storage company, met SanDisk’s rigorous specifications for endurance, performance and reliability.

At the heart of the all-flash storage array is the 800GB capacity Optimus Extreme™ solid-state drive (SSD) from SanDisk. This enterprise-class SSD offers predictable, sustained performance, superior reliability, and high endurance, while also lowering the cost of enterprise-grade solid-state storage. The Kaminario architecture is scalable and features proprietary technology, including N + 1 redundancy to ensure the high resilience and uptime of SSDs (See Figure 1).

The Optimus Extreme™ SAS SSD Advantage

Interface: SAS 6Gb/s

24nm MLC Capacity*: 800GB

Sustained Read/Write (MB/s)** up to: 500/500 MB/s ; 1 GB/s Wide-Port

Random Read/Write up to: 95K/40K IOPS

Mean Time Between Failures (MTBF)***: 2.5 Million Hours

Endurance:Drive Writes per Day (random/sequential)****: 50/50 DWPD

Figure 1:SanDisk IT Test Lab Performance

With the SanDisk/Kaminario all-flash solution, the data dilemma was solved. Against a production workload, HP Autonomy indexed 100 million emails and attachments in just 12 days, averaging 7.2 million messages per day – a result that surpassed even the most optimistic estimates and was a fraction of the 50 days (two million emails a day) that it would take conventional HDDs to complete the same task. The all-flash array storage solution achieved ultra-high performance, high IOPS, low latency and high resilience (See Figure 2). The architecture was optimized for data protection and proved to have high reliability and availability.

Figure 2:SanDisk IT Production Performance

Data Center Efficiencies: Reducing Operational Costs

Traditional HDD storage can be less expensive in the short-run than flash storage, but the price gap is narrowing and the cost of All-flash Solid State Storage Arrays is declining rapidly (See Figure 3). In the long run, the Total Cost of Ownership (TCO) of SSDs makes them a highly desirable, cost-effective solution in the high-density data center. The TCO of SSDs must also take into account considerations beyond the cost per gigabyte. Those include much higher performance of SSDs, lower power consumption, reduced cooling costs, fewer chassis racks, and a smaller data center footprint.3

Another benefit related to SSDs is their ability, as compared to HDDs, to dramatically reduce system latency and vastly improve the utilization of Central Processing Units (CPUs) in a highly-virtualized data center environment. In some cases, the rate at which CPUs can “crunch” data when coupled with SSDs increases by an order of magnitude (10x),4 enhancing the performance of the entire infrastructure – including storage, servers and networking equipment.

CIOs, therefore, have the opportunity to leverage investments in applications, hardware and software. In addition, they can reduce and/or defer the need to make financial outlays to buy additional equipment to upgrade data center infrastructure to overcome the latency shortcomings of traditional storage.

Market research has come to a similar conclusion about the disparity in performance between traditional storage (HDDs) and server-based CPUs. It has been noted that this performance disparity has created a huge market opportunity for SSDs to meet the ongoing data center challenges related to latency and input/output operations per second (IOPS).5

SUMMARY: The Future of SSDs in the Enterprise

With billions of emails being generated each year worldwide, CIOs constantly wrestle with the problem of efficiently storing that information and making it immediately accessible to the business.

Executives around the world in data-intensive industries are demanding near-instantaneous response times. Because of their high performance and low latency attributes, SSDs are being broadly adopted by CIOs in various industries including legal, finance, banking, high technology, airlines, automotive, manufacturing and many more. SSDs are ideal for a number of common data-intensive storage applications, including online transaction processing (OLTP), high performance computing (HPC), data warehousing, Big Data real-time analytics and cloud computing.6

As a result, SSDs are transforming the data center. The day is not far off when all-flash storage arrays will become ubiquitous in corporate and government data centers, as well as those managed by third-party service providers.

SanDisk IT, recognizing these trends, set out to prove the sustained performance advantages of SSDs in SanDisk’s real-time enterprise operating environment. HP Autonomy software, running on SanDisk Optimus Extreme SSDs and Kaminario's advanced storage architecture, survived numerous regression lab tests and also demonstrated world-class resilience in a demanding production environment. Based on the phenomenal results, SanDisk IT foresees the SanDisk SSD/Kaminario combination as a focused high-value solution for HP Autonomy software and for select high performance applications in a broad cross-section of industries.

The Benefits of SSDs in the Data Center

Thin, dense, lightweight semiconductor storage devices

Extremely high performance with low read/write latency

Very low power consumption and low heat dissipation

Small physical footprint with fewer storage racks and chassis

High endurance to process high-performance workloads

High resistance to shock and vibration

Generates substantially less noise than a traditional HDD

Low failure rates and low operating maintenance

Ideal in virtualized data center environments

Kaminario Technical Advantages

Kaminario K-Block has N+1 Redundancy. Any failure of a component inside a K-Node (SSD, PCB etc) will cause a K-Node to failover to a management node.

For a single node failure in a K-Block this N+1 architecture is acceptable without any significant performance degradation

For a multi-node failure in a K-Block before remediation has completed, this N+1 architecture will shut down the storage array. In this scenario any data written after the first node failure is at risk for data loss