Quantify Cyber Risk Now

Is Machine Learning Secure? | Lucideus Research

Introduction
Machine Learning, in simple words, can be defined as the science of getting computers to learn without being explicitly programmed. It has emerged to be an indispensable tool these days. Data is the key to any machine learning algorithm. Machine Learning techniques are being applied in all sorts of fields including finance, e-commerce, healthcare, computer vision, robotics and now even in cybersecurity.

The most prevalent use of machine learning in cybersecurity is the task of classification of various kinds that comes under the supervised learning paradigm. Network traffic data can be analysed effectively through machine learning. It can be used to build Intrusion Detection Systems detecting both network and computer intrusions and misuse. During training, the model is taught what comprises a normal behaviour profile. Then, testing involves comparing the current data with the created profile to check for any anomalies. It can classify different network attacks like scanning, spoofing, etc. This classification concept can also be extended to the endpoint layer where we can classify programs as malware, ransomware and spyware. On the user level, it can be employed in anti-phishing solutions, spam filters, etc. Support Vector Machines, Random Forests, Artificial Neural Networks, Convolutional Neural Networks are some common algorithms used for this purpose.

The problem arises when there is a malicious adversary who uses intelligent and adaptive methods to manipulate the input data which leads to exploitation of specific weaknesses of the learning algorithm and hence causes the compromise of the entire system. The factors responsible for this issue and its implications have led to the creation of a new subfield called “Adversarial Machine Learning” which lies at the intersection of machine learning and computer security.

Attacks
A taxonomy of different kinds of attacks on the learning algorithms is given below:

InfluenceCausative: They manipulate the training data.Exploratory: They probe the learning algorithm to discover weaknesses.

SpecificityTargeted: They focus on a specific sample or a small set of samples.Indiscriminate: They are more flexible and focus on a generic set of samples.

Security ViolationIntegrity: They aim to get malicious sample misclassified as legitimate.Availability: They increase the misclassification error rate to make the system unusable (eg. Denial of Service).Confidentiality: They aim to retrieve information from the learner compromising the privacy of the system users.

Defences:
Following are some of the ways to defend against the attacks on the learning algorithms:

One possible way is to use the statistical technique of regularization in which a penalty term is added to the objective function that we are trying to optimize. This method makes our decision boundary smooth and hence removes complications that can be exploited. Besides this, prior distribution or information can be encoded to remove overfitting and reduce the dependency on data which helps in defending against causative attacks.

A test dataset can be created containing different variants of intrusions and some random points on which we can evaluate our learning algorithm after it has been trained. Clustering can also be used on the data classified by the learner to detect compromises.

The learning algorithm can make the adversary believe that it misclassifies a specific intrusion. This essentially acts as a honeypot and an increase in the number of instances of a particular attack reveals the existence of an adversary.

Randomization can be incorporated in the way the decision boundary is created to increase the amount of effort required to include a targeted point into the set of points that are considered “good” by the learning algorithm, also called the support of the algorithm.

Data can be mapped to an abstract space by the learning algorithm and then classification can be performed in this new space. This increases the computational cost of generating points that lead to false negatives even if the attacker has some information about the decision boundary.

Model for Attack Strategy:

A simple outlier detection framework is considered here which is the widely used technique in numerous applications like intrusion detection, virus detection, etc. The circle shown in the above figure depicts the hypersphere containing the points considered to be “good” or “normal” by the learning algorithm. Hence, it is the support of our algorithm. The points outside the support are treated as outliers or anomalies. The centre of the hypersphere is the mean of all points in the support.

In each iteration of the algorithm, as new data is received, the hypersphere moves a little and hence the support gets updated. Points G and G’ shown in the above figure are the target points which the adversary wants to be classified as normal. They were initially classified as outliers. Here, the strategy for the adversary is to consider the line joining the centre of the hypersphere and the target G. The point where this line intersects the edge of the hypersphere is the desired attack location. The adversary can choose to place a certain number of points at this location which moves the support a little closer to the target G in every iteration. It could take several iterations to include both G and G’ in the support. The effort of the adversary is measured by the total number of points placed in all the iterations.

There are two trade-offs involved in the attack strategy. More the number of points placed in an iteration more is the displacement of the hypersphere. However, if too many points

are placed very early, then it makes the hypersphere difficult to move. Another tradeoff is between the number of iterations and the total effort.

Conclusion
As the penetration of machine learning techniques into a variety of cyber security applications increases, the significance of the field of adversarial machine learning also increases. In an adversarial environment, the key is to anticipate the possible ways that the attacker will try to use in order to disrupt our machine learning algorithm. The online nature of the algorithms and the lack of stationarity enables the attackers to exploit the learning system by using various methods.

Facebook

Follow by Email

Categories

Lucideus is an Enterprise Cyber Security platforms company incubated from IIT Bombay and backed by Cisco's former Chairman and CEO John Chambers. It protects multiple Fortune 500 companies and governments around the world. The name Lucideus is derived from Lucifer (Satan) and Deus (God) as they are in the business of hacking for good.