Search Direct2DellEMC

Search for:

open menu

Subscribe

Join our community of Direct2DellEMC blog readers and never miss another post by subscribing to our email newsletter. Fill out the brief form below and you will get a confirmation e-mail for your subscription.

First Name

Last Name

Email

Country

I agree to receive Direct2DellEMC blog notifications and email communications regarding offers and announcements from DellEMC, its group of companies, subsidiaries and business partners. I understand that I may unsubscribe from these communications at any time by using the “unsubscribe” link at the bottom of the DellEMC email.

Search Direct2DellEMC

Search for:

Six Questions About Big Data Cyber Risk Answered

One of the hottest topics for both DellEMC and Hortonworks today is how to protect big data repositories, data lakes, from the emerging breed of cyber-attacks. We sat down to discuss this topic to address some of the common questions we’ve faced, and would love to know your thoughts and contributions. Our thanks also to Simon Elliston Ball for his contributions to the discussion.

Photo by Markus Spiske on Unsplash

What are the new threats that pose a specific threat to big data environments?

The threats to big data environments come in a few broad areas.

First, these are ‘target rich’ environments. Years of consolidating data, in order to simplify management and deliver value to data science and data analytics, makes for an appealing destination for cyber attackers. These will be subject to many ‘advanced persistent threats’ – cyber attackers and organisations trying to use extremely focussed and targeted techniques ranging from spear-phishing to DDoS attacks to gain access to or exploit your big data platforms in some way.

Second, they are powerful computational environments. So, things like encryption attacks, if they are ever unleashed on big data operating environments, could potentially spread very rapidly.

Third, big data repositories are often accessible to many employees internally. In general, this is a good thing, as how else could organisations tap into the potential value of big data? But a comprehensive framework to monitor and manage data access and security is required to protect against possible abuse or exploits.

What about big data environments that makes them more or less vulnerable to threats like WannaCry/Ransomware?

The good news is that WannaCry and other ransomware variants currently in the field don’t really target the operating systems on which big data platforms run. The bad news is, it’s probably just a matter of time before they do. And the fact that these environments are very capable computational resources means that these sorts of exploits could spread fast, if steps aren’t taken to protect them.

What are some best practices to limit the possible spread of malware like WannaCry?

There’s a lot about the way big data platforms are architected that could potentially protect against these malware forms – assuming the right steps are taken. Here are some suggestions:

First, conduct basic big-data hygiene. Many organisations have historically perceived big data environments, Apache Hadoop clusters etc., as internal-only resources, protected by the network firewall. This may well be the case (to a point), but the nature of APTs means that if it’s there, people will find a way to reach it. If you’ve left default passwords in place, haven’t set sensible access restrictions for employees (governed and audited by tools like Apache Ranger) and so on… get that all done! Access controls will also limit the spread of any encryptionware to accessible data sets to each compromised user/set of credentials.

Deploy behavioural security to protect your environment. The industry guesstimate is that there are 300 million new viruses and malware variants arriving each year. Signature based security will fail against ‘day zero’ threats, so behavioural analytics is essential to monitor the activity across the environment and detect as well as protect against potential infections. If a system notices large-scale read/write activity typical of an encryption attack (but VERY unusual for a normal data lake), then it can shut it down dynamically by policy.

Set a sensible snapshot policy to allow for ‘rollbacks’ at the levels that meet the recovery point objectives and recovery time objectives set for key data sets. This won’t necessarily mean creating daily snapshots of a multi-petabyte data lake, but might mean that certain critical data have more routine snapshots than less critical data. You can of course set these tiers in policy, given the right resources. This is a massive boon for Hadoop Distributed File System (HDFS).

Do IT organisations know how to set Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs) for big data environments?

One of the most common misunderstandings in deploying big data environments is that you can still think of RTOs and RPOs for the infrastructure as a whole. You can’t – it’s too large! You’d have to build in such a vast amount of redundancy as to make the whole thing commercially impossible. Rather, you need to set RTOs and RPOs for individual data sets or storage tiers within the environment. In this context, you need to allow sufficient slack in your resources for the right number of snapshots to be in place for key data sets to insulate you from risk. This might be anything from 30-50 percent unused capacity in a given storage tier, made available for snapshots, though the latter would be verging on overkill in most cases.

What about tackling the employee challenge to big data security?

It’s a critical part of protecting any environment, educating employees, as this will be a more likely first possible entry point into an organisation than anything else. Raising employee awareness around the dangers of spear phishing, modern malware attacks, and beyond. The standard tricks of redirecting people to websites and downloads, via sending dubious email attachments and beyond have become much more sophisticated.

The people that attempt to hack a Hadoop cluster might start by hitting a system administrator with a Servicenow helpdesk request… This camouflage makes it difficult to spot. It’s important to remember that the people that are coming after these resources are good… not script kiddies or mass market ransomware opportunists, but people who are into causing serious damage, either for ideological or commercial reasons.

Even with training, people will remain a weak link. Given another guesstimate that the “per event” reputational and regulatory impact of a breach can cost up to two percent of market cap, having good remediation policies, processes and technologies in place given the eventual inevitability of a breach is key.

How do these security practices tie into wider security, risk and compliance objectives for a business?

The critical component here is the audit piece, given need to know exactly where your data is being stored, controlled and processed, and what it’s being used for in an evolving regulatory context. This is something you both apply to your use of big data, but also something big data enables you to achieve, for other systems as well. The audit and exfiltration monitoring tools you build in as part of your hygiene planning around your big data are useful, for example… but these logs are no use without analytics, and without being able to cross-reference and cross-check other data resources, e.g. if a piece of personal information has been accessed on one system, does it also exist on others? And should it therefore have been deleted from all?

The rise in the volumes of unstructured data represents a huge number of unknowns. As such, we are going to see a huge opportunity around digital transformation. Organisations are going to be forced to assess how they handle data and put in some big improvements in terms of the structure of their environments, their ability to do those analytics, pull back the information in a short amount of time and so on… else organisations may be exposed to potential regulator enforcement/investigation scrutiny for failure to embed within an organisation appropriate data governance and data security.

For those interested in functional ways they can tackle these problems, Dell EMC Isilon has built-in tools that aide in recovery from a ransomware attack; however, detection & prevention is a much better alternative. Fortunately Dell EMC partners with Superna and Varonis to offer ideal solutions.

If you’re interested in how Dell EMC Isilon and Hortonworks customers tackle other challenges around gaining value from their big data, join our upcoming webinar on “Batch + real-time analytics convergence” in late November. Register here.

Related Posts

It seems like everyone’s talking about big data these days to achieve competitive advantage through business efficiencies, improved customer service, and industry-disrupting innovation. Gaining access to massive amounts of information … READ MORE

Dell EMC Ready Solutions for Data Analytics are designed to help our customers and partners deploy systems for supporting the deployment of advanced analytic workloads. Covering Data Analytic Solutions for … READ MORE

This blog is the third in a three-part series written for National Cybersecurity Awareness Month. [previous post] People were surprised a few months ago when we announced we were introducing … READ MORE

This blog is the second in a three-part series written for National Cybersecurity Awareness Month. [previous post and final post] We live in a world centered around 24/7 connectivity, making cybersecurity … READ MORE

All comments are moderated. Unrelated comments or requests for service will not be published, nor will any content deemed inappropriate, including but not limited to promotional and offensive comments. Please post your technical questions in the Support Forums or for customer service and technical support contact Dell EMC Support.