Big Data & The Security Skills Shortage

Finding a security analyst with the data discovery experience to combat modern threats is like searching for the mythical unicorn. The person does not exist

Recent high-profile, high-impact data breaches across industries, including financial, healthcare, and retail, prove that today’s cybercriminals are adept at finding and fully exploiting even the smallest security gaps. Detection of their malicious activity often comes much too late – and at great cost for companies and their customers.

Not surprisingly, business leaders are starting to ask more of chief information security officers (CISOs) and other security operations personnel. They want assurance that the organization and its assets are protected. And they are, increasingly, looking for ways to leverage security analytics to strengthen cybersecurity. To do that, security analysts will need to bring three diverse skill sets to the table:

Data science expertise: The security analyst needs advanced analytical skills, like using machine learning and predictive analytics algorithms, and should know how to prepare data for analysis.

MapReduce/Spark/Storm/Hive/Pig expertise: The security analyst must be able to code a number of big data technologies designed to optimize analysis of petabytes of data.

While it would be ideal to find a security analyst who is proficient in all of these areas, I can tell you with confidence that, like the unicorn, this person does not exist. In fact, in the security space, its next to impossible to find security professionals with just one of these specialized, essential skill sets.

The challenges aheadBecause stealthy cyber-attacks can operate in networks undetected for weeks, months, or even longer, security analysts must be able to identify and analyze patterns that span lengthy time periods. They also must be able to visualize all of this data in way that helps them “connect the dots” and identify activity that’s out of the norm.

Another issue is the hundreds or thousands of security incident alerts organizations receive every day -- the vast majority of which are not malicious activity or targeted attacks. Differentiating between true, targeted attacks and non-malicious incidents is extremely difficult unless security analysts are armed with the skills and tools they need to make them entry-level data scientists.

When security analysts ask the right questions of big data, they can discover attack sequences and better understand the business impact of these events. To maintain that focus, security analytics supporting these investigative workflows must handle #2 and #3 on the list of criteria for the security analyst/unicorn: data science expertise and MapReduce/Spark/Storm/Hive/Pig expertise.

Security analysts equipped with these tools can better harness their security domain expertise. They can analyze security incidents, detect root causes, and unearth larger attacks before adversaries can exfiltrate high-value data. As a result, analysts looking to combat modern threats by taking advantage of big data, and data discovery, will need to acquire, or hone the following skills:

Identify the sequence of an attack: Security analysts need to analyze data surrounding incidents to identify anomalies and patterns that are not “normal.” By factoring in IT, user, and business application data as context, they can reach a conclusion on the impact of a security incident.

Ask many questions and get fast answers: Security analysts must conduct security investigations based on hypothesis and suspicion. They must ask as many questions as needed, receive fast responses, and quickly pivot their investigations based on those responses.

Derive insights on petabytes of data: In many organizations, security events and audit logs from IT, user, and business applications can amount to 10 terabytes (TB) of data per day. Given that a typical data breach timeline is 243 days, security analysts need to detect anomalies and patterns going back as far as 12 months – which requires them to analyze petabytes of data.

Translate security incidents into business impact: Security analysts need a centralized view of IT, user, business application data and security event data. Multi-structured data must coexist in a single repository, and be transformed and correlated so that the outcome of security investigations is about business impact.

The unicorn is a mythical creature, but a security analyst with deep security domain expertise is certainly not. When you support a skilled security analyst with security analytics on big data, your organization will be able to gain a complete picture of network and data security risks, and more quickly detect and mitigate advanced cyber-attacks.

An early member of the Platfora team, Peter leads the charge to put big data in users' hands in a beautiful and easy way. Peter Schlampp remembers playing with the keys of a TRS-80 when he was seven years old - and since then, he's been destined to build category-defining ... View Full Bio

Christian – Thanks for the response. I agree with your assessment, it is a dream job, but I fear that it's a dream job for only a handful of people. I will borrow from the well published quote – "A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician". You can expand that quote to say – A security analyst needs to be a better security expert than data scientist and software engineer, and a better data scientist than a security expert and software engineer. . .and you get the rest". My point is that while there will be smart people that work in all three domains and can acquire the right skill set – the industry needs people with these skills in vast numbers not only a handful. As a security professional and with faith in human ingenuity, I would rather have security analysts combat cybercriminals and defend my organization over machines. As a vendor, what we can do is provide the right tools and data insights to the security analyst so that they are spending all their time making decisions as opposed to collecting and preparing data for analysis.

While I agree that this analyst probably doesn't exist in the formal InfoSec organization, I'd argue that there are probably hackers out there that actually have the needed skillset but haven't touched on every area you've noted with full expertise. What I think you are describing, however, is a dream job. I think there is an opportunity here for the InfoSec industry to build out the skillset requirements and the education needs toward honing these skills into a certifiable InfoSec career role. Understand, the result might have to be a whole new collection of tools; I've read several books on data science and have been wowed by the Python and R code out there some data scientists are using to work with data on the scale you describe. Marry that to either several years' experience in the underground, or working as a white hat in corporate environments, and you have your unicorn.

I think if this could become a certification track, not only would the InfoSec sector be the better for it, but, damn, would the work-day get that much more interesting and enjoyable for some lucky geeks :-)

Peter, great article. We are seeing an increase in businesses seeking specialized skills to help address challenges that arose with the era of big data. The open source HPCC Systems platform from LexisNexis helps to fill this gap by allowing data analysts themselves to own the complete data lifecycle. Designed by data scientists, the programming language called ECL is declarative and expresses data algorithms across the entire HPCC platform. Their built-in analytics libraries for Machine Learning and BI integration provide a complete integrated solution from data ingestion and data processing to data delivery. HPCC Systems provides proven solutions to handle what are now called Big Data problems, and have been doing so for more than a decade.

I agree that "in the security space, its next to impossible to find security professionals with just one of these specialized, essential skill sets." I'm also concerned about all sensitive business data that we are collecting in Big Data.

I think that Big Data is changing the way we are dealing with data. Unfortunately, many organizations have rushed into Big Data focused solely on ROI, and privacy is an afterthought. Many companies are now collecting data files into Big Data environments without fully understand what specific sensitive information that is hidden in those files.

Since there is a shortage in Big Data skills and an industry-wide shortage in data security personnel, many organizations don't even know they are doing anything wrong from a security perspective. In many cases they do not have the resources to analyze before collecting huge volumes of data files.

I think that many organizations shortly will be struggling with a major big data barrier:

1. I think a big data security crisis is likely to occur very soon and few organizations have the ability to deal with it.2. We have little knowledge about data loss or theft in big data environments.3. I imagine it is happening today but has not been disclosed to the public.

I noted that companies are starting to follow these guidelines. For example, Hortonworks Hadoop distribution for Big Data recently released the types of features that Gartner is recommending, including data tokenization (on the node!), advanced HDFS Encryption, key management and auditing.

Data breach fears and the need to comply with regulations such as GDPR are two major drivers increased spending on security products and technologies. But other factors are contributing to the trend as well. Find out more about how enterprises are attacking the cybersecurity problem by reading our report today.