چکیده انگلیسی

With popularization of internet, internet attack cases are increasing, and attack methods differs each day, thus information safety problem has became a significant issue all over the world. Nowadays, it is an urgent need to detect, identify and hold up such attacks effectively. The research intends to compare efficiency of machine learning methods in intrusion detection system, including classification tree and support vector machine, with the hope of providing reference for establishing intrusion detection system in future.
Compared with other related works in data mining-based intrusion detectors, we proposed to calculate the mean value via sampling different ratios of normal data for each measurement, which lead us to reach a better accuracy rate for observation data in real world. We compared the accuracy, detection rate, false alarm rate for four attack types. More over, it shows better performance than KDD Winner, especially for U2R type and R2L type attacks.

مقدمه انگلیسی

In recent years, as internet and personal computers are populated, utilization rate of internet keeps increasing. It is changing people’s lives gradually, and the majorities of people study, recreate, communicate and buy through internet. Besides common people, enterprise structure and business mode also undergoes transformation due to internet, and large enterprise or government organizations, in order to achieve operation purpose and efficiency, develop many application and service items resting on internet; these are an irresistible tendency in the new era.
However, though internet brings about convenience and real-timeliness, consequently comes information safety problem; for example: servers are attacked and paralyzed, inner data and information are stolen, and so on. In the event of such cases, big losses may be caused in money and business credit. For example, in 2000, American Yahoo was subject to DDos attack, the servers were paralyzed for 3 hours approximately, 1 million users were affected, and the losses involved were too large to calculate. Other famous business internets, such as CNN, eBay, Amazon.com, Buy.com, and so on, also suffered such internet attacks.
Because of convenience of internet, it is easy to get access to attack knowledge and methods. At present, hackers are unnecessary to have a wide knowledge of specialized knowledge, and annual internet attack cases are increasing to a great extent. According to the statistics of American Computer Emergency Response Team/Coordination Center (CERT/CC) (http://www.cert.org/), annual network attack cases showed index growth, in recent years; according to the report of Information Security (http://www.isecutech.com.tw/), internet attacks have became new weapon of world war, and the report said that Chinese Military Hacker had drew up plan, with the view of attacking American Aircraft Carrier Battle Group to make it lose fighting capacity through internet. Such information reveals that it is an urgent need to effectively identify and hold up internet attacks nowadays.
Common enterprises adopt firewall as the first line of defense for internet safety, but the main function of firewall is to supervise accessing behaviors of internet, and it owns limited detection capacity for internet attacks. Therefore, Intrusion Detection System, IDS is always applied to detect internet encapsulation, to improve protective capacity of internet safety.
IDS appears like internet supervision and alarm device, to observe and analyze whether the internet attacks may occur, timely send alarm before risks are caused by attacks, execute corresponding response measures, and reduce occurrence of bigger losses. Moreover, some technologies are based on pattern check, with low mis-judgment rate, but the pattern-based should be upgraded on a regular basis, such technologies do not possess enough detection capacity for unknown and renewed attack manners. Recently, many researches applied the technology of data mining and machine learning, which can analysis bulk data, and such technologies own better detection capacity for unknown attacks. Though some research achievements have been scored, there is a lot of development potential.
Under such circumstance with most same conditions, how is the efficiency of different machine learning methods applied in intrusion detection. Besides the said manners, what methods are there? Therefore, the research intends to compare the efficiency of different machine learning methods applied in intrusion detection, include classification tree, support vector machine, and so on, with the hope of providing possible suggestion for improvement, as the reference for building intrusion detection system.
The research process is shown in Fig. 1.

نتیجه گیری انگلیسی

The research compares accuracy, detection rate, false alarm rate
and accuracy of other attacks under different proportion of normal
information. KDD Cup 99 dataset is current benchmark dataset in
intrusion detection; however, its data is not distributed evenly, error
may occur if only one set is used. Therefore, in comparison, the
research applies different normal data proportion for training and
test, finally get one average value, and hopes to obtain a more
objective results.
For comparison results of C4.5 and SVM, we finds that C4.5 is
superior to SVM in accuracy and detection; in accuracy for Probe,
Dos and U2R attacks, C4.5 is also better than SVM; but in false
alarm rate, SVM is better.Dataset KDD Cup 99 applied in the research is popularly used in
current intrusion detection system; however, it is data of 1999,
and network technology and attack methods changes greatly, it
cannot reflect real network situation nowadays. Therefore, if
newer information is got and tested and compared refresh, they
can more accurately reflect current network situation.
Through test and comparison, the accuracy and detection rate of
C4.5 is higher than that of SVM, but false alarm rate of SVM is
better; if we combine the two methods, overall accuracy can
be increased greatly.
In sampling, the research supposes that the distribution of
attack data other than normal data is even, which cannot surely
get optimal results, and this should be improved and validated
in future.
C4.5 parameters set in the research is not optimal, thus the
future work should optimize the parameters according to C4.5
parameters and different training dataset.
SVM applied in the research uses its built-in grid.py to optimize
its parameters, and it needs approximately 2 hours to search
parameters for 10,000 groups of data in the research; however,
it is not suitable, for intrusion detection system requires realtimeliness.
The future research should aim at the direction
where the parameters can be optimized rapidly.