Internal

Google Summer of Code 2015 Project Ideas

This page contains a list of potential project ideas that we are keen to develop during GSoC 2015 (we also have additional project ideas currently undergoing internal review, which will be added here once project deliverables and available mentors have been confirmed).

You can view our previous GSoC 2009, GSoC 2010, GSoC 2011, GSoC 2012, GSoC 2013 and GSoC 2014 project ideas pages if you are looking for other inspiration, or you might like to work on one of our existing tools, rather than working on something new. We are always also interested in hearing any ideas for additional relevant computer security and honeynet-related R&D projects (although remember that to qualify for receiving GSoC funding from Google your project deliverables need to fit in to GSoC's 3-month project timescales!). If you have a suitable and interesting project, we will always try and find the right resources to mentor it and support you.

Please note - even if you aren't an eligible GSoC student, we are also always looking for general volunteers who are enthusiastic and interested in getting involved in honeynet R&D.

Each sponsored GSoC 2015 project will have one or more mentors available to provide a guaranteed contact point to students, plus one or more technical advisors to help applicants with the technical direction and delivery of the project (often the original author of a tool or its current maintainer, and usually someone recognised as an international expert in their particular field). Our Google Summer of Code organisational administrators will also be available to all sponsored GSoC students for general advice and logistical support. We'll also provide supporting hosted svn/trac/git/redmine/mailman/IRC/etc project infrastructure, if required.

So unsurprisingly a number of our suggested potential project ideas fall into these research areas. However, we are also interested in receiving project proposals and tool updates/new tool developments outside these research focus areas too. So hopefully this provides potential students with a wide variety of exciting topics to contributed to and be engaged with once again this summer.

(more project ideas and mentors to follow, once internal review is complete)

GSoC 2015 Project Ideas

Droidbox Projects:Droidbox [1] is _the_ open source sandbox for Android app analysis. It has been developed during the previous Google Summer of Code 2012 by Patrik Lantz and has continued to evolved ever since. Several academic, open source, and even commercial projects are based on it [2-4]. Since Android is evolving, we also want to evolve Droidbox to keep up with new technologies (such as ART), make easier to use, and to provide even better integration with other sandboxing frameworks (such as the leading open source Cuckoo sandbox system, which as also developed during previous GSoCs).

From Android 4.4, Android system has a new runtime called ART [1][2] together with Dalvik. Users can switch between those two runtimes. However, since Android 5.0, Google totally abandoned Dalvik, so ART becomes the only runtime. Current dynamic analysis systems such as DroidBox, TaintDroid, DroidScope, etc., they are built on Dalvik VM, porting them to ART seems impossible since they depend on DVM heavily.

The goal of this project is to build a dynamic malware analysis system on ART, which allows users to monitor the execution of potentially malicious apps. This includes the following sub-goals: • Monitoring function calls • Modifying parameters/return value before/after function's execution • Dumping objects' contents • Reporting layer that is compatible with existing systems

The solution of this project should guarantee two points: low performance overhead and easily maintainability of analysis environment for future new Android versions.

Cuckoo Sandbox (developed during GSoC 2010 in The Honeynet Project [1]) has evolved in the de-facto open-source standard for malware analysis systems. It contains capabilities for analyzing in malware in various Windows environments, a clean architecture, and easy-to UI. It is used by many open source and commercial sandboxing efforts, including Google's own VirusTotal infrastructure. Similarly, Droidbox [2] has evolved into a core technology for various other projects.

The goal of this project is to combine Droidbox into Cuckoo in order to evolve Cuckoo into powerful malware analysis framework that can analyze Android just as well Windows malware with a single look & feel.

A fork of Cuckoo Sandbox [3] is currently using the stock Android ARM emulator inside a Linux virtual machine in order to execute APKs and open URLs. The execution is then monitored and result are then stored inside Cuckoo Sandbox’s database and displayed using its web interface. We would like to use a different approach and achieve comparable results by using Droidbox/Taintdroid [4] to run the APK samples.

As described in the introduction, Droidbox is used in a range of other projects. Even though these projects collect a lot of data, only few of them are available and the data is not always shared. The main goal of this project is therefore to design and implement both an easy-to-use web-frontend and also the backend to automatically analyze and retrieve the combined results of several Android reverse engineering tools.

The online Android sandbox, which will be hosted in the Honeynet Project’s cloud, will initially combine powerful tools developed as part of previous GSoC, like Androguard[1] for static analysis and Droidbox for dynamic analysis[2] but it is to be designed as an extensible platform to allow for the inclusion of new tools and techniques. The frontend will allow for any user to submit an APK and visualize the results from the static and dynamic analysis as soon as the execution of the sample is completed. As implemented in malwr.com[3], the online sandbox for x86 binaries based on Cuckoo, the dynamic analysis will be queued and run in the background and an email will be sent to the user when ready.

If successful, this GSoC project will guarantee constant testing and feedback for the Androguard and Droidbox codebases, which will sure lead to further and faster improvements. Moreover, it will provide a constant stream of suspicious samples and a platform to test experimental techniques developed within the Honeynet Project.

A lot of Android malware relies on social engineering in order to infect devices. Since user interaction is required for installation, a large amount of Android malware verifies that a real user is present before starting its malicious actions (e.g. clicking a button). Similarly, some malware requires specific stimuli to verify it is running on a real phone (e.g. changing GPS coordinates). Other malware will check if it is running in an analysis environment by checking if there are at least 15 contacts on the phone.

The goal of this project is to provide the most realistically looking environment for malware in order to trigger all of the malicious actions.

One subgoal is to populate existing images in a dynamic way such that each analysis looks like a different phone (e.g. different contacts in address book). In addition, certain stimuli should be created such that they trigger required actions in the malicious app. Last but not least, the project includes to add a fake user that behaves as human as possible.

MITMPROXY Projects: mitmproxy is a man-in-the-middle HTTPS proxy. It is an interactive console program written in Python that allows HTTP network traffic flows to be inspected and edited on the fly. mitmproxy has >100.000 downloads/year, >3000 stars on GitHub and more than 50 contributors - you’re not going to work on an academic prototype but on a project with a large community instead. :-)

In the end of 2014, we started working on our web front-end “mitmweb”, which will bring a nicer UI, UX and Windows support to mitmproxy. Our long-term goal is to achieve feature-parity between the web-interface and the console application on most parts. The goal of this project is to accelerate the process by adding new features to the web interface and improving the existing application functionality. We’re using a great modern webapp technology stack (React.js (Flux), Bootstrap, Gulp, ...), so you can work with the latest technologies and focus on good code rather than IE support.

In its current form, mitmproxy is primarily designed to be an HTTP proxy. However, the web is slowly catching up with newer protocols and we want to add support for these in mitmproxy as well! Your task would be to implement basic support for the HTTP2 and WebSocket protocols in Python and wire them up to mitmdump, our non-interactive command line client. We expect you to learn the HTTP2 and WebSocket protocols during the project, knowing their individual details is not a prerequisite. You should, however, be familiar with HTTP, as this will make learning HTTP2 significantly easier.

Project Name:Project 7 - Rumāl (or just Rumal)Mentor: Pietro Delsante (IT)Backup mentor: Andrea De Pasquale (IT)Skills required: Python, Django + TastyPie, HTML/JavaScript, MongoDBProject type: Improve existing toolProject goal: Provide a web GUI for Thug, designed as a sort of social network where data can be enriched with metadata coming from various sources, and where users can share results, settings, analyses and whatever else.Description:

Thug is a client honeypot developed during previous GSoC years that is used to analyse potentially malicious websites. Now that Thug is pretty stable and in general use, this project aims to be Thug's dress - providing a convenient web GUI - but also its weapon, as it should provide a set of tools that should enrich Thug's output with new metadata and allow for correlation of results.While it is perfectly possible to use it as a simple web GUI for Thug on your own computer, with you as the only user, Rumāl has been designed to support multi-user environments, just like a sort of social network, allowing you to share your results and your settings with other users and groups.

The first version of Rumāl interfaces with the results that Thug already saves in MongoDB in its default configuration, and provides a convenient way to display the results and to perform cross-analysis searches and correlations. Future releases should also enrich the analysis results with metadata (e.g. WHOIS for domains and IP addresses, connectors with Cuckoo, VirusTotal, comments and votes from users, and so on).

Rumāl is written in pure Python, using Django for the web server and Django-Tastypie for the APIs; the HTML/JavaScript part is made with standard libraries like Bootstrap 3, jQuery, jQuery UI, DataTables and so on.

Malcom (https://github.com/tomchop/malcom) is a tool that leverages network forensics analysis and threat intelligence to identify and counter malware-related threats. Its objectives are twofold: collect network artifacts from active sniffing sessions when running malware in a sandbox, and collect and analyze data from public and private data feeds, to be able to correlate observed network artifacts with a large databank (individual or collective) of “known-bad” artifacts.

The goal of this project is to extend Malcom’s capacities to break malware’s encryption keys and protocols, by leveraging known campaign keys and trying them out on network comms (major feature).

A secondary goal would be to work on interoperability between different Malcom, CRITS, or MISP instances to enable better intel sharing.

Students would be working with the mentors on extending the existing Malcom code base to add these valuable new features.

Conpot Projects:

Conpot is a low interactive server side Industrial Control Systems honeypot designed to be easy to deploy, modify and extend. By providing a range of common industrial control protocols we created the basics to build your own system, capable to emulate complex infrastructures to convince an adversary that he just found a huge industrial complex. To improve the deceptive capabilities, we also provided the possibility to server a custom human machine interface to increase the honeypots attack surface. The response times of the services can be artificially delayed to mimic the behaviour of a system under constant load. Because we are providing complete stacks of the protocols, Conpot can be accessed with productive HMI's or extended with real hardware.

Conpot provides a variety of common protocols: Modbus, S7Comm, SNMP, HTTP and Kamstrup. We are always working on getting additional protocols supported. This is a rather complicated task as many protocols don't have an open source implementation, documentation is rather complex or simply not available. One of the protocols we are interested in is DNP3 (Distributed Network Protocol) which is similar to IEC 60870-5 and often used for communication between control centers, RTUs (Remote Terminal Units) and IEDs (Intelligent Electronic Devices). Conpot has a feature which we call the Proxy Module. This allows us to proxy incoming requests through Conpot to a service and back to the client. When we implement a new protocol in Conpot, we set up an instance with this proxy module and tunnel all requests from the client to e.g. a real device or a service with that protocol running on another host. Then, piece by piece, we are decoding the message in Conpot while it passes through so we get insight into the intention of the request. Right now we have a very basic decoder for the DNP3 protocol which we would like to extend.

Conpot provides a variety of common protocols: Modbus, S7Comm, SNMP, HTTP and Kamstrup. We are always working on getting additional protocols supported. This is a rather complicated task as many protocols don't have an open source implementation, documentation is rather complex or simply not available. One of the protocols we are interested in is BACnet (building automation and control networks) which is defined in the standard ISO 16484-5. BACnet is used for communication between systems like heating, ventilation and lights in a building. There are a couple of decent open source implementations which we highly recommend as inspiration for this project.

Conpot supports the protocols a common PLC is providing but not the functionality of a PLC. This means besides some randomized values and linear incrementing values like uptime the data in the honeypot is static. In order to appear more realistic and handle input values properly, we would like to support a PLC simulator. A good candidate is Awlsim (http://bues.ch/cms/hacking/awlsim.html): Awlsim is a free Step 7 compatible AWL/STL Soft-PLC written in Python. Awlsim provides an interface for virtual hardware connection modules (currently available are PROFIBUS-DP and LINUX-CNC). This interface could be used to connect Awlsim to Conpot.

PEEPDF Projects:

Introduction:peepdf is a Python tool to explore PDF files in order to find out if the file can be harmful or not. The aim of this tool is to provide all the necessary components that a security researcher could need in a PDF analysis without using 3 or 4 tools to make all the tasks. With peepdf it's possible to see all the objects in the document showing the suspicious elements, supports all the most used filters and encodings, it can parse different versions of a file, object streams and encrypted files. With the installation of PyV8 and Pylibemu it provides Javascript and shellcode analysis wrappers too. Apart of this it's able to create new PDF files and to modify/obfuscate existent ones.

The PDF filters are algorithms used in the PDF format to compress and encode stream objects. Unfortunately, they are also used to hide malicious code and therefore bypass AntiVirus detections. Currently, there are 9 different filters in the PDF specification. peepdf supports 6 different PDF filters for decoding and 3 for encoding. The first idea for this project is improving the implementation of some existent filters, finishing the encoding implementation of other filters and adding support for new ones (/JBIG2Decode, /DCTDecode and /JPXDecode). There are already some proof of concepts to embed Javascript code in images, for instance, so the main goal will be improving peepdf to be able to detect these new threats. This is an example: https://www.virusbtn.com/virusbulletin/archive/2015/03/vb201503-lossy

Currently, it is possible to identify the suspicious elements in a PDF file because they are shown in a different color (yellow). While it helps for experimented analysts or users with some experience with the PDF format and/or threat analysis, it could be difficult to understand for less skilled users. The first step to accomplish this task would be identifying the elements which permit distinguish if a PDF file is malicious or not, like Javascript code, lonely objects, huge gaps between objects, detected vulnerabilities, etc. The next step would be creating the system to obtain a score out of these elements and test it with a large collection of malicious and not malicious PDF files in order to tweak it.

The peepdf interactive console is a really powerful tool to analyze PDF documents. However, there are some users more used to GUIs and web interfaces. Thinking in these potential users, the idea behind this project would be creating a web interface for peepdf with the same functionality than the command line, being able to emulate Javascript code and shellcodes, show specific objects, visualize the logical and physical structure of the PDF document, extract the desired content, etc. The idea is executing it locally, but depending on the result of the project this could be the beginning of a new online tool to analyze PDF documents.

CWMP is a text based protocol that defines an application layer protocol for remote management of end-user devices. Commands sent between the device (CPE) and auto configuration server (ACS) are transported over HTTP (or more frequently HTTPS). At this level (HTTP) a CPE device is behaving in the role of client and ACS in the role of HTTP server. This essentially means that control over the flow of the provisioning session is the sole responsibility of the device.

The goal of this project is to build a first proof of concept honeypot which would emulate the vulnerability described in the talk above, with the purpose of trying to determine the volume and sophistication of these kinds of attackers. With the rise of the Internet of Things this protocol gains more and more importance and attention as an attack surface, so we would like to carry our more exploratory research in this important emerging field.

The concept is to implement a TR069 server with common devices behaviours to dump attackers payloads. If also possible, a TR069 scanner would be developed too, to help in creating the signatures.

The implementation approach could be based on extending existing industry standard low interaction honeypot solutions such as Dionaea, or could be a stand alone system. As a new tool we are open to the student suggesting appropriate technologies and programming languages.

There are a couple of tools out there to collect Passive DNS data (e.g. passivedns by gamelinux and pdnsd), but they only work by sniffing authoritative DNS answers inside network traffic and by storing them. There is a huge amount of other sources that could be used to collect Passive DNS data: for example, almost every organization has a web proxy, and its logs almost always contain a domain name, an IP address and a timestamp. The same data set can be extracted from other textual logs from DNS servers (Bind, Microsoft DNS, etc), web servers, IDS/IPS, and even sandboxes (Cuckoo) and honeypots (Thug) or other Passive DNS databases (VirusTotal, DNSDB, etc). YAPDSN should provide an interface (e.g. a Syslog-NG local destination) to collect basic assiciations between an IP address and a domain name, along with the first and last time the association was seen. Other data can be added for specific log sources (e.g. DNS logs also contain TTL, record type, etc), or gathered from external repositories (e.g. association with malware in VirusTotal’s database, etc).

YAPDNS should also provide an interface with a search engine, a set of dashboards and some correlation rules (e.g. track by ASN, geolocation, fast-flux behaviour, etc). The tool should also provide some REST-like APIs to facilitate integration with other tools.YAPDNS should also use HPFriends to facilitate data sharing between various trusted entities.

Communication with other projects and software may use the Common Output Format proposed by this draft on IETF.

Description:dpkt[1] is a python library that helps with "fast, simple packet creation/parsing, with definitions for the basic TCP/IP protocols". It supports a lot of protocols (currently about 63) and has been increasingly used in a lot of network security projects. It is 44x faster than scapy, and 5x faster than impacket. With Scapy no longer in development, dpkt is the only network creation/parsing library for python that is active.While dpkt is a really powerful library, it needs some improvements. Late last year, we started work on cleaning up the project, and started fixing bugs[2]. We intend to do a 2.0 release [4] later this year.

Here is a list of goals for the 2.0 release:

Make dpkt Python 3 compliant. This is a lot more work than it seems

Clear the issues queue (about 77 as of 2/19/15)

TestSuite and start a pcap corpus for tests

These short list itself would take most of the summer.However, if the student finishes early and have time, we would also like to:

Start dpkt documentation to readthedocs.org

Examples in the project wiki. This has been asked a lot over the years.

Cuckoo Sandbox (developed during GSoC 2010 in The Honeynet Project [1]) has evolved in the de-facto open-source standard for malware analysis systems. It contains capabilities for analyzing in malware in various Windows environments, a clean architecture, and easy-to UI. It is used by many open source and commercial sandboxing efforts, including Google's own VirusTotal infrastructure.

This summer we would like to extend it to add a number of important new features to Cuckoo, to make it even more powerful and feature complete:

We would like to expand Cuckoo to support execution of Linux malware. To develop this feature it is required to design and write a custom python analyzer (a little engine with modules), that will follow Cuckoo's existing win32 architecture to run the malware inside a Linux virtual machine, instrument and record the malware behavior then return the execution analysis information back to Cuckoo's existing reporting components.

We would like to expand Cuckoo to support execution of Mac OS X malware. To develop this feature it is required to design and write a custom python analyzer (a little engine with modules), that will follow Cuckoo's existing win32 architecture to run the malware inside a Linux or OSX virtual machine, instrument and record the malware behavior then return the execution analysis information back to Cuckoo's existing reporting components.