Internal

Google Summer of Code 2014 Project Ideas

This page contains a list of potential project ideas that we are keen to develop during GSoC 2014 (we also have additional project ideas currently undergoing internal review, which will be added here once project deliverables and available mentors have been confirmed). You can view our previous GSoC 2009, GSoC 2010 , GSoC 2011 , GSoC 2012 and GSoC 2013 project ideas pages if you are looking for inspiration, or you might like to work on one of our existing tools, rather than working on something new.

We are always also interested in hearing any ideas for additional relevant computer security and honeynet-related R&D projects (although remember that to qualify for receiving GSoC funding from Google your project deliverables need to fit in to GSoC's 3-month project timescales!). If you have a suitable and interesting project, we'll always try and find the right resources to mentor it and support you. Please note - even if you aren't an eligible GSoC student, we are also always looking for general volunteers who are enthusiastic and interested in getting involved in honeynet R&D.

Each sponsored GSoC 2014 project will have one or more mentors available to provide a guaranteed contact point to students, plus one or more technical advisors to help applicants with the technical direction and delivery of the project (often the original author of a tool or its current maintainer, and usually someone recognised as an international expert in their particular field). Our Google Summer of Code organisational administrators will also be available to all sponsored GSoC students for general advice and logistical support. We'll also provide supporting hosted svn/trac/git/redmine/mailman/IRC/etc project infrastructure, if required.

So unsurprisingly a number of our suggested potential project ideas fall into these research areas. However, we are also interested in receiving project proposals and tool updates/new tool developments outside these research focus areas too, so hopefully this provides potential students with a wide variety of exciting topics to contributed to and be engaged with once again this summer.

(more project ideas and mentors to follow, once internal review is complete)

GSoC 2014 Project Ideas

Name: Project 1 - Wire’n’Sics Plugins (aka: WireShnork reloaded)Mentor: Guillaume Arcas (FR)Backup mentor: Sébastien Larinier (FR)Skills required: TCP/IP, C, Python, LUAProject type: Reegnineering old toolsProject goal: Extending Wireshark Network Forensics capabilities with pluginsDescription: Wireshark (and the CLI tshark utility) is a great network analyzer that is often useful during network security and forensics operations. A previous successful GSoC 2011 project was dedicated to extending and enhancing Wireshark with some additional plugins such as WireShnork. Since then, Wireshark internals have evolved, so the plugins written for GSoC 2011 are now deprecated.

The goal of this project is to re-engineer some of these plugins and to make them compatible with current and future wireshark releases. That can be done:
- by integrating the plugins in wireshark core engine or using the LUA scripting language.
- by taking advantage of the PCAPNG format that supports adding tags directly into a pcap packet capture file that will later be opened with Wireshark. As an example, that could be carried out by adding PCAPNG support to Snort and then letting Snort write its output as tags directly in the pcap file.

Name: Project 2 - Droidbox native introspectionMentor: Patrik Lantz (SE)Backup mentor: Felix Leder (DE/NO)Skills required: C, Java internals/JNI knowledge, Android malwareProject type: Improve existing toolProject goal: The goal of the project is to add native introspection and monitoring capabilities to Droidbox. With these techniques, it is possible to generically monitor malware across Android versions without having to patch the actual APKs.Description:Droidbox is a dynamic malware analysis system (a.k.a. Sandbox) for Android based malware that was originally developed during GSoC 2011. One of the initial challenges of Droidbox has been that it required adjustments for each new Android OS version released. A degree of portability was added by patching the Android apps before they are run in Droidbox by a later GSoC project. This is an approach that is also followed in other Android sandbox solutions. However, patching Android apps can be detected and used as an evasion technique by system attackers.

One solution to both keep compatibility and at the same time avoid detection is to move the introspection functionality further down into the virtual machine. The Dalvik VM exposes most of it’s functionality via the “JNI” interfaces. In this project we want to “harden” Droidbox with such introspection, hopefully making it more difficult for attackers to prevent malicious Android applications from being monitored.

Name: Project 3 - Droid-BOT Mentor: Felix Leder (DE/NO)Backup mentor: Patrik Lantz (SE>Skills required: Python, CProject type: New Tool to Improve existing toolProject goal: The Droid-BOT is a virtual user for Android devices. The user’s goal is to interact well enough with potentially malicious app so that they show their “real face”.Description:
Malicious Android apps are more and more often hiding their malicious payload behind user actions. These can be simple “OK” dialogues or fake video players. The motivation behind this is that the malicious apps want to avoid detections in sandbox/malware analysis environments that traditionally only provide passive observation and instrumentation, but not real user activity. By checking for simple user behavior, a malicious Android app can tell the difference between interacting with a human compared to idling inside a sandbox. That potentially allows it to alter it's behaviour and avoid detection.
With the Droid-BOT project, we want to create a virtual user that interacts with malicious Android apps, so that the apps are encouraged to start executing their true malicious behavior. The initial goal is to implement this for Droidbox, a dynamic malware analysis system (a.k.a. Sandbox) for Android based malware that was originally developed during GSoC 2011 and updated in GSoC 2012. By designing Droid-BOT as a framework, it should also be possible to use this approach in other analysis environments too.

mitmproxy is a man-in-the-middle SSL-capable HTTP proxy. It is an interactive console program written in Python that allows HTTP network traffic flows to be inspected and edited on the fly. With it's next release, mitmproxy is going to have gain a web interface that was originally developed as a GSoC project in 2012 and 2013. Our long-term goal is to achieve feature-parity between the web-interface and the console application on most parts. The goal of this project is to accelerate the process by adding new features to the web interface and improving the existing application functionality.

Name: Project 5 - Conpot: ICS/SCADA honeypot Mentor: Lukas Rist (DE)Backup mentor: Johnny Vestergaard (DK)Skills required: Python, TCP, (HTTP, FTP, modbus, snmp, dnp3 and IEC 60870 an advantage)Project type: Improve existing toolProject goal: Conpot is an ICS honeypot with the goal to collect intelligence about the motives and methods of adversaries targeting industrial control systems. In this project we want to add additional protocols, improve the existing protocols, data logging, system and vulnerability emulation and overall infrastructure virtualization.Description: Until now setting up an Industrial Control System (ICS) honeypot required substantial manual work, real physical systems which are usually either inaccessible or expensive and learning about quite tedious protocol specifications. By implementing a master server for a larger set of common industrial communication protocols and virtual slaves which are easy to configure, we provide an easy entry into the analysis of security threats against industrial infrastructures and control systems.
A student applying for this project has to be open to learn new network protocols and adopt then modify existing implementations. This includes also automated testing and continuous integration, management of sensor deployments and data analysis. As this field is quite young and unexplored it will provide a large variety of challenges to solve.

Name: Project 6 - YAPDNS (Yet Another Passive DNS) Project Name:Mentor: Pietro Delsante (IT)Backup mentor: Andrea De Pasquale (IT)Skills required: Python, Django, HTML/JavaScript, PostgreSQL/MySQLProject type: New toolProject goal: Collect Passive DNS data from various sources; display, correlate and analyze them.Description: There are a number of existing tools to collect Passive DNS data (e.g.>passivedns by gamelinux andpdnsd), but these tools generally only work by sniffing authoritative DNS answers within network traffic and by storing them. There are a huge amount of additional sources that could be used to collect Passive DNS data: for example, almost every organization has a web proxy, and its logs almost always contain a domain name, an IP address and a timestamp. The same data set can be extracted from other textual logs from DNS servers (Bind, Microsoft DNS, etc), web servers, IDS/IPS, and even sandboxes (Cuckoo) and honeypots (Thug) or other Passive DNS databases (VirusTotal, DNSDB, etc).

YAPDSN should provide an interface (e.g. aSyslog-NG local destination) to collect basic assiciations between an IP address and a domain name, along with the first and last time the association was seen. Other data can be added for specific log sources (e.g. DNS logs also contain TTL, record type, etc), or gathered from external repositories (e.g. association with malware in VirusTotal’s database, etc).
YAPDNS should also provide an interface with a search engine, a set of dashboards and some correlation rules (e.g. track by ASN, geolocation, fast-flux behaviour, etc). The tool should also provide some REST-like APIs to facilitate integration with other tools.

YAPDNS should also use the Honeynet Project's existing HPFeeds and HPFriends systems to facilitate easy data sharing between various trusted entities.

Name: Project 7 - Exploit Kit Forensics FrameworkMentor: Pietro Delsante (IT)Backup mentor: Andrea De Pasquale (IT)Skills required: Python, Django, HTML/JavaScript, PostgreSQL/MySQLProject type: New toolProject goal: Build a framework to facilitate forensic analysis of infections caused by exploit kits.Description: Imagine you have a web proxy (or something able to reconstruct and log HTTP requests from live network traffic, such as Bro IDS) generating logs. You want to be able to understand when one of your internal client systems gets compromised by an exploit kit.

Simply looking for PE executables and trying to download them from their original URL won’t always work. For example, because the Exploit Kit requires the user to go through a complete chain of redirections before serving its payload to you.

The idea behind the Exploit Kit Forensics Framework is that of creating an automated process that monitors proxy logs, detects when a dangerous file has been downloaded (e.g. by looking at the content type, or by correlating information from an IDS such as Snort). The process would then analyze the HTTP Referrer field from the logs to “rewind the tape” up to the exploit kit's entry point, which could then be passed to a client honeypot such as (Thug). This would then "replay the tape" and attempt to download the suspicious file, so that it can now be sent to a sandbox such as Cuckoo,VirusTotal or any other tool for analysis.

The Framework should also include a Web GUI to display the analyzed events and the output of the analysis process, along with some dashboards. To help integration with SIEMs and other systems, the Framework should also provide a remote logging mechanism (e.g. syslog) and some REST-like APIs.

Implement the authentication part of the RDP protocol (client and honeypot side).

Description: Beeswarm is an active intrusion detection system (IDS) with a focus on ease-of-use. After development during GSoC 2013, the system currently consists of three parts: A managment interface, Honeypots and Clients. The active part of the system is the Clients that generates semi-realistic bait traffic on the network designed to tempt the attacker to dump credentials and reuse them on the Honeypots.

This year during GSoC we would like to develop a algorithm that automatically generates configurations and deployment plans for Beeswarm honeypots/clients. Another thing that is currently missing from the system is the emails that are supposed to be transmitted between Beeswarm clients and honeypots. It could be interesting to extract spam mails from one of our mail honeypots (such as GSoC 2013's Shiva spampot) or develop a algorithm that embedded bait (honeytokens) in the generated mails.

We already have pretty good coverage of common network protocols (ssh, vnc, smtp, pop3, pop3s, http, https) but we would also like to have support for the RDP protocol. This does not needs to be a complete implementation but just the authentication part and then a dummy RDP traffic generator to make the interaction traffic look semi-legitimate.

2) collect and analyze data from public and private data feeds, to be able to correlate observed network artifacts with a large databank (individual or collective) of “known-bad” artifacts.

The goal of this project is to extend Malcom’s capacities to break encryption keys used within malware, by leveraging known malicious campaign encryption keys and trying them out on network communications (major feature). Another aspect of the project would be to work on existing Malcom features to increase stability and performance (minor features).

Name: Project 10 - String deobfuscator for Android Mentor: TBC (TBC)Backup mentor: TBC (TBC)Skills required: Android RE, python, javaProject type: Improve existing tool | New toolProject goal: Extract the strings that are obfuscated or encrypted from APKsDescription: This could be and extension for Androguard, or a new tool. There are some cases where you would easily extract the obfuscate strings inside the dex code on the APK, because they are pointing to dynamic code, creating urls on the fly or creating other instructions.
The approach could be dynamic tainting analysis, or source code analysis. For the second one, it could be less expensive.
Extracting this strings will help to improve the analysis of the android malware.

Name: Project 11 - Cuckoo Sandbox Mentors: Mark Schloesser, Jurriaan Bremer, Claudio GuarnieriSkills required: Python, C, OS X/Linux internalsProject type: Improve existing toolProject goal: Extend Cuckoo sandbox to support Mac OSX and/or LinuxDescription:
Since the beginning, we designed Cuckoo Sandbox with the intent at some point to be able to support multiple platforms. Since it started in GSoC 2010, Cuckoo has now grown to be a mature project with thousands of users and an active development community which is bringing remarkable improvements to the sandbox. Our Windows analyzer is improving fast and it will even more in the upcoming months.

The goal of this project is start experimenting with preferably Mac OS X or alternatively Linux, as threats for such platforms are on the rise. The student will have to research into the most suitable process tracking techniques for the chosen operating system, implement a functional analyzer and integrate it in the overall execution flow of Cuckoo Sandbox. We explored this idea last year but unfortunately did not find a student able to take on the challenge. Hopefully this year we will.

Name: Project 12 - Thug: Phishing sites identificationMentor: Angelo Dell'Aera (IT)Backup mentor: Andrea De Pasquale (IT)Skills required: Python, HTML/JavaScriptProject type: Improve existing toolProject goal: Build a new feature in Thug in order to allow phishing sites identification.Description: The project aim is to extend Thug in order to fingerprint phishing pages. A lot of times URLs fed into Thug lead to phishing sites and not to drive-by download exploit pages.
The idea beyond this project is building some heuristics (which could include looking for form submissions, how many domains are used, misspelled words as well as URL blacklist checks like PhishTank or others) and integrating them into Thug.