Machine Learning (ML) has found it particularly useful in malware detection. However, as the malware evolves very fast, the stability of the feature extracted from malware serves as a critical issue in malware detection. The recent success of deep learning in image recognition, natural language processing, and machine translation indicates a potential solution for stabilizing the malware detection effectiveness. We present a color-inspired convolutional neural network-based Android malware detection, R2-D2, which can detect malware without extracting pre-selected features (e.g., the control-flow of op-code, classes, methods of functions and the timing they are invoked etc.) from Android apps. In particular, we develop a color representation for translating Android apps into rgb color code and transform them to a fixed-sized encoded image. After that, the encoded image is fed to convolutional neural network for automatic feature extraction and learning, reducing the expert’s intervention.We have run our system over 800k malware samples and 800k benign samples through our back-end (60 million monthly active users and 10k new malware samples per day), showing that R2-D2 can effectively detect the malware. Furthermore, we will keep our research results on http://R2D2.TWMAN.ORG if there any update.

Ransomware such as WannaCrypt and Petya have caused significant financial loss and even have endangered human life (e.g., ransomware attack on UK hospitals). Ransomware on desktop has gained much attention from academic and industry. However, we see that the number of ransomware on Android phones remains steady increasing, but gains much less attention. As Android has been the most popular smartphone OS and a substantial number of credentials are kept only in smartphones, the data loss incurs serious inconvenience and loss. Here, we present our deep learning-based ransomware detection system, coloR-inspired convolutional neuRal network-based androiD ransomware Detection (R2D2). R2D2 was originally developed to sweep the malware, but we found it particularly useful in detecting ransomware. A unique feature is its end-to-end training, without human intervention. Such an end-to-end training points out a direction that we no longer need tedious search for roust ransomware features for detection. Most importantly, based on R2D2, we develop techniques to encode ransomware as so-called ransomware image, such that the ransomware from the same family exhibit the same pattern and even non-experts can detect and even determine the ransomware family with their the naked eye.

Nowadays, smartphone has become a daily necessity in our life. In the smartphone market, Android is the most commonly used operating system (OS), and it is still expanding its market share. According to the report by International Data Corporation (IDC) in 2016, the market share of Android in smartphone market increased from 84.3% in 2015 Q2 to 86.8% in 2016 Q3. Android is featured by its openness; users can choose to download apps from Google Play or third-party marketplace. However due to the popularity and openness, Android has attracted attacker’s attention. According to the data provided by our research cooperate partner Leopard Mobile Inc. (Cheetah Mobile Taiwan Agency) collected from its core products Security Master and Clean Master, it shows that the number of Android malware increased sharply from 1 million in 2012 to 2.8 million in 2014. In 2015, the number of Android malware detected was three times than the one in 2014 (over 9.5 million). In 2016 the number of Android malware achieved more than 17 million. In the first half of 2017, there was more than 10 million of the number of Android malware found in Security Master and Clean Master.

As mentioned above, the majority of the Android malware detection still relies on the static analysis of source code and the dynamic analysis through monitoring the execution of malware. However, these approaches are known to have better detection accuracy for the same family of malware only. It means that it requires vast reverse-engineering for feature engineering to find out the most suitable part for identifying Android malware. These methods reply on looking for the API invocation of classes.dex of Android apps permissions. In reality, Android malware has dramatic growth in numbers and mutates with fast speed and with various anti-analysis techniques. All of these characteristics make the accurate detection extremely difficult. In particular, from our samples, we found that the amount of unknown sample (called gray sample) that cannot be detected by current methods is twice than those can be identified. If gray sample can be detected fast enough, we can decrease the tendency for Android device to be attacked.

We attempt to find out the hidden relationship between the program execution logic and the order of function calls behind the malware by taking advantage of the CNN in order to accurately detect known and even unknown malware. Since the byte-code in classes.dex and rgb color code are both hexadecimal, we proposed a transformation algorithm that enables the bytecode as rgb color code and into color image (called Android color images). After that, we apply CNN to the Android color images for the Android malware detection. Our proposed system (R2-D2: coloR-inspired convolutional neuRal network-based androiD malware Detection) works by only decompressing Android apps (APK) and then translating the dalvik executable classes.dex into Android color images. Our R2-D2 possesses the following advantages:

1. R2-D2 translates classes.dex, the core of the execution logic of Android apps, into RGB color images, without modifying the original Android apps and without extracting features from the apps manually in advance. It can complete a translation from execution code to image within 0.4 second. Such translation is also featured by the fact that more complex information in the Android apps can be preserved in the color image with 16777216 colors (each sampling with 24 bit pixels) compared to the gray scale image with only 256 colors (each sampling with 8 bit pixels).

2. With the fully connected network infrastructure of DNN, it can though deal with fast-changing malware with its large amount of parameters, however, the local receptive fields and shared weights of CNN make it more suited for more complex structure. It not only decreases the amount of parameters, but also reflects the complexity of Android malware, saving the time for huge computation with current method.

Moreover, we have an image distance test on our Android malware image, where samples of the same Android malware family sharing similarity in their visual patterns are close to each other in the sense of distance from Levenshtein, RMS deviation and MSE image distance which validated our proposed Android color image algorithm that is suitable for the classification with CNN (e.g., 00ee9561c5830690661467cc90b116de of Android:Jisut-JY [Trj] from AVAST and cdaf446c3e1076ab48540e02283595ac of Android.Trojan.SLocker.IS from BitDefender are 54.59% similar, 0cc0908c2fbd8f9b31da3afe05cb3427 of Trojan-Ransom.AndroidOS.Congur.aa from Kaspersky and ce2fedd0ca9327b8d52388c11ff3b4ca of Android.Trojan.SLocker.IS from BitDefender are 54.94% similar etc.). Although in the fine-grained sense is not accurate, Android malware image visual mode of this similarity will help us to quickly classify Android malware, greatly reducing labor costs.

Our research cooperate partner’s (Leopard Mobile Inc., Cheetah Mobile Taiwan Agency) core products (Security Master and Clean Master reached 3,810 million installations globally with 623 million monthly active users by December 2016) that we used to collect our sample . We can collect 10k benign and 10k malicious samples daily in average. The data was collected from January to May in 2017, among which we had a collection of approximately 1.5 million of benign and malicious Android apps for our experiments. We keep our research results and data on the website http://R2D2.TWMAN.ORG (72 GB). So far, we have accumulated 2 million effective data collections and continue to train and adjust our model.

This research adopts deep learning to construct an end-toend learning-based Android malware detection and proposed a color-inspired convolutional neural network (CNN)-based Android Malware detection, labelled as R2-D2. The results show that our detection system works well in detecting known Android malware and even unknown Android malware. We have published the system to our core product to provide convenient usage scenarios for end users or enterprises. The future work is to reduce the complex task and train for higher performance in confronting the Android malware, avoiding from a huge amount of computation burden. The experiment material and research results are shown on the website http://R2D2.TWMAN.ORG if there are any updates.