7 Surprising Facts about AI and Big Data in Cybersecurity

In this special guest feature, Kumar Saurabh, CEO and co-founder of Logichub, observes how the correlation between data volumes and IT security seems straightforward, but in reality it’s complex and at times paradoxical. He provides 7 surprising facts about big data and artificial intelligence (AI) as they are used in cybersecurity. Kumar has 15 years of experience in the enterprise security and log management space leading product development efforts at ArcSight and SumoLogic. He has a passion for helping organizations improve the efficacy of their security operations, and personally witnessed the limitations of existing solutions in helping SOC analysts detect threats buried deep within mountains of alerts and events. This frustration led him to co-found LogicHub™ to empower cyber analysts by building intelligence automation, not just analytics. Most recently Kumar was Co-founder and Vice President of Engineering at Sumo Logic. Kumar earned his M.S. in Computer Science from Columbia University and B.S. in Computer Science from IIT Kharagpur.

In mature organizations, IT security depends on big data. Security analysts and IT administrators rely on vast amounts of security data to detect and characterize threats. More often than not, it is through big data that attacks are stopped and vulnerabilities eliminated.

A Security Information and Event Management (SIEM) platform running in an enterprise typically collects millions or even billions of events from every possible system, endpoint, network and security tool. Within this vast trove of data are clues about security threats. Human security analysts must make sense of this data quickly to discover attack patterns without being overwhelmed by SIEM alerts that turn out to be false positives.

The correlation between data volumes and IT security seems straightforward, but in reality it’s complex and at times paradoxical. Here are 7 surprising facts about big data and artificial intelligence (AI) as they are used in cybersecurity.

Attacks can lurk on enterprise networks for months – The average “dwell time” (the length of time an attack remains undetected) on a network is almost 7 months. You might think that collecting more event data would lead to faster attack detection and shorter dwell times, but the opposite is true. Dwell times have risen in sync with the rise in the amount of event data being collected. Today, attacks have far too much time to explore networks and systems, discover vulnerabilities, install malware, and exfiltrate data. Security teams need to reduce dwell times by an order of magnitude to reduce risk, and they are really struggling.

Despite having vast amounts of SIEM data, security buyers are feeling desperate – They’re unable to keep up with the pace and variety of attacks. And security teams are woefully understaffed. According to one estimate, 209,000 IT security jobs went unfilled in the U.S. in 2015. Increasing work and a shortage of qualified staff is leading many buyers to seek automated solutions.

Complex AI cybersecurity products make big promises about analyzing SIEM data but have failed to deliver – Many IT security vendors are responding to this big data overload by offering AI systems that promise to magically parse the data and discover attacks. Unfortunately, most of these systems have failed to live up to their promises. These systems were designed based on academic theories that have yet to prove themselves in real-world conditions. CISOs and security teams should be wary of technical name dropping, i.e product descriptions stressing Bayesian models and Markov models and similar terms.

AI requires training data, and in most cases that labeled training data is not available – An AI solution is only as good as the data is was trained on. Most organizations lack the required volumes of security data labeled to indicate where attacks are really taking place.

AI cybersecurity products tend to generate more false positives – New cybersecurity products should reduce the number of false positive alerts swamping Security Operation Centers (SOCs). Instead, because of a lack of good training data and context, many AI products generate even more false positives than before. They analysis turns out to rely on detecting anomalies from a baseline, but not all anomalies are threats.

AI products cannot explain themselves – If an AI cybersecurity product takes an action, it can’t explain why it did so. Human SOC team members are held accountable for their decisions. They have to be able to explain their decisions. Without knowing why an AI system took an action, it becomes impossible to tune the system and make it more effective.

Effective AI products require a human feedback loop – AI will never be effective operating in any domain by itself. It will always require training data, context, and feedback from humans. To tackle the problem of big data in cybersecurity, an AI solution will need to feature a feedback mechanism so it can be trained and refined by input from security SOC team members.

The big data problem in cybersecurity isn’t going to go away. AI promises a way to accelerate analysis and reduce attack dwell times, but SOC teams should keep their eyes open about the true capabilities of the AI solutions offered by vendors.

Resource Links:

Industry Perspectives

In this special guest feature, Brian D’alessandro, Director of Data Science at SparkBeyond, discusses how AI is a learning curve, and exploring opportunities within the technology further extends its potential to enable transformation and generate impact. It can shape workflows to drive efficiency and growth opportunities, while automating other workflows and create new business models. While AI empowers us with the ability to predict the future — we have the opportunity to change it. [READ MORE…]

Latest Video

White Papers

In this Databricks e-book, you not only discover how to avoid and overcome the most common challenges impacting AI success, but a new concept is also introduced. Download the new e-book that explores Unified Analytics, a concept that brings together solutions that unify data science and data engineering, making AI much more achievable for enterprise organizations and enabling them to accelerate their AI initiatives.