This is a multi-part blog post about creating my own hacking game to teach other people the excitement of exploiting vulnerabilities. To try it out, just connect to ssh [email protected] with password level0. You only need a little bit of Linux command line knowledge. And get used to google a lot ;)

On Linux or Mac just open a terminal and type that command in. If you are on Windows you can use PuTTY.

Prelude

In 2012 I came across Capture the Flag by stripe. At that point I knew a little bit of assembler - I knew a little bit how the stack works and I kinda knew what a buffer overflow is. But I had never seen or exploited one myself. The CTF hooked me and I was so eager to solve those challenges. With a lot of time and googling I was able to solve the levels and got a T-Shirt that I wear proudly to this day. As stripe's blog post mentions, they were inspired by io.smashthestack.org. So I moved on to io and til this day I haven't finished all the levels. I believe I'm stuck on level 17 - but I always come back to it and realize that I learned more and can solve the next level.

Motivation

I have this character flaw, that I get super obsessed with stuff. And I can never understand why other people are not interested in something I'm so enthusiastic about. So I guess in an attempt to get more people into this field that gives me so much excitement I wanted to create my own game - a game that is dedicated to beginners with a slow skill curve, so they don't get frustrated too quickly (though, that is part of the fun).

For an overview on how the game works, here is the README that you can access when you login as level0:

[email protected]:~$ cat README
┌───────────────────────────────────────────────────────────────────────────┐
│ How it works... │
├──────────────────────────────────────────────────────────────────────────┬┘
│ This is a hacking game. The goal is to hack from level to level. │
│ │
│ You are currently level0. The password of your current level can be │
│ found in ~/.pass │
│ + run `id` to display your current user id │
│ + display your current password `cat /home/level0/.pass` │
│ │
│ So your goal is to find the password for the next level (level1). With │
│ the password you can then connect to the next level │
│ + `ssh [email protected]` to login with the found password │
│ │
│ The level relevant files can be found under /matrix │
│ + display the files for level0 `ls /matrix/level0/` │
│ │
│ A good point to start is to read the "story" in your home folder. It │
│ will give some motivation for the current level, it will tell you what │
│ files are necessary and maybe give additional info. │
│ + display current story `cat ~/story` │
│ │
│ Sometimes there is a story recap available, which contains additional │
│ information about the challenge that you just solved. Usually this means │
│ you will discover new tools or techniques how to solve a challenge. If │
│ you have a particular nice solution that you would like to share, │
│ contact me, and I might add it. │
│ + display the demo recap `cat ~/recap` │
│ + the recap for level0 is in `cat /home/level1/recap` │
│ (you need to get access to level1 before you can read it) │
│ │
│ To show people that you made it to a particular level, you can add your │
│ nickname, messages and secrets to the "iwashere" file. You can only read │
│ and append something to the file. │
│ + show the world that you found this game: │
│ `echo "I made this. ~samuirai" >> ~/iwashere` │
│ + look at who was in level0 `less ~/iwashere` or `cat ~/iwashere` │
│ │
│ Most important point. Have fun. The worst thing that can happen is, that │
│ you accidentally learn something. │
└──────────────────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────────┐
│ Rules and System Info │
├──────────────────────────────────────────────────────────────────────────┬┘
│ 1. Do not DoS this or any other system. Don't be a kiddy! │
│ 2. Do not connect to remote systems from this. │
│ 3. Do not use too many resources. This is a very small server. │
│ 4. Do not spoil challenges (no writeups!), but helping newbs good. │
│ 5. Be excellent. │
├──────────────────────────────────────────────────────────────────────────┤
│ - levels can be found under /matrix │
│ - You can only write to /tmp. │
│ - Unused files and folders in /tmp are deleted after a few hours. │
│ - If you want to have a specific tool installed, contact me. │
│ - If you find bugs, please contact me. │
└──────────────────────────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────────────────────────┐
│ Start │
├──────────────────────────────────────────────────────────────────────────┬┘
│ 1. read the story for your current level │
│ `less ~/story` │
│ 2. find the files in `ls /matrix/level0` │
│ 3. create a working directory in /tmp to develop scripts and tools │
│ 4. solve the challenge and get the password │
│ 5. login as level1 │
│ 6. read the recap for this level │
│ `cat ~/recap` │
│ 7. read the story for level1 and solve the next challenge │
└──────────────────────────────────────────────────────────────────────────┘

It all started in November 2014 with an online qualification on hacking-lab.com, where we had to solve different challenges and collect points. The challenges were different from the typical CTF jeopardy style where you only have to find a flag. At hacking-lab you always have to write a report which will be reviewd by people. Also due to the fact that it was not only intended for experienced university students, but also for much younger highscool students (<18 y/o), many of the challenges were not that difficult - in the end all qualified uni students had solved all challenges. I'm also quite resentful that there was not a single pwning (binary exploitation) challange and the only reverse engineering challange was a very simple XOR encryption and we had FOUR captcha breaking challenges :P But it was fun nonetheless.

Once the deadline was over, the 20 best students were invited to Berlin to participate in a live hacking competition in the beginning of February 2015. We got divided randomly into 4 teams with 5 people per team. Two university student (Studenten) teams and two highschool student (Schüler) teams. I got very lucky with my team, because we complemented eachother very well. easysurfer for example has a lot of experience with reversing on windows and he was able to solve a mean challenge really fast, or EPG who has experience with patching java byte code in android apps solved a game hacking task super fast. So we were able to solve a lot of tasks pretty quickly.

In my opinion the challanges for the live hacking competition were a lot more fun than the one from the quialification round. Especially because we had a bunch of very cool reversing challenges. Unfortunately I can't do a writeups, because the challanges may get reused for other events :(

Here is a picture from our Team Orange in the middle of the competition:

Once a team solved a challenge and got awarded the points, the other teams had only 1h left to solve it. When the deadline passed, the other teams had to give up those points and move to another challenge - so it was quite strategic where you spend your time on. Because we were so quick with some of the tasks, we were able to establish quite a lead :P

This is a picture of me trying to explain "Port Knocking" to a Secretary of State. I failed.
http://t.co/zHVOOelCgM

When I talked to many of the students who participated, they actually didn't do much or, anything at all, regarding hacking. So this event motivated many of them to look into security and discover a new passion - which is pretty awesome.

In the end our Team Orange won the Cyber Security Challenge Germany 2014 and we all got new ThinkPad T440p - which I gave to my significant other, because she always supports me and she has to tolerate those hours over hours I neglect her to pursue my dreams. Thank you!

I believe as in any field of science we need to have a discussion about published research. Especially when we think there is something wrong with the "experiments" and the resulting conclusion. Maybe I'm completly overlooking something, but at this point I don't even understand how this talk got accepted to a renowned conference like Black Hat.

First I want to give a quick summary of what Ashar Javed claims. Then I want to talk about what I thought is the consensus of the security community regarding XSS. And at the end I want to evaluate his conclusion/solution. Unfortunately I haven't seen his talk, so I can only read his paper and guess what he said during those 168 slides.

Research Summary

Basically he claims that he found Cross-site Scripting exploits in the top 25 online WYSIWYG editors with.

But what exactly did he exploit? He says,

The third-party WYSIWYG editors are normally available in the form of client-side JavaScript library, PHP or ASP based sever-side component and Rails gem.

So we already have multiple components - the client-side editor and sometimes a server side script that handels the input. And he is not clear about what component he exploited.

This is an example data flow. A user creates a post using a WYSIWYG editor, sends the post to the server where it get's stored in a database. And when another user wants to read the post, the server purifies/encodes the post properly, so it can be safely rendered in the user's browser:

Let's go over each possible exploitation scenario:

1. Exploiting the Javascript Editor

1.1 Edit/Quoting Functionallity

Let's assume there is a sanitized, completly safe, forum post like this:

But when another user would want to [Quote] your post, and by doing so automatically copies the string into his WYSIWYG editor and it executes the javascript, then we have a minor XSS issue.

Is this one of his attacks? I'm not sure.

1.2 Self-XSS

Self-xss means a user get's tricked into hacking himself. This works against really stupid unknowing people and is for example an issue for facebook - I could tell a 14 y/o kid that he can get free FarmVille credits when he presses F12 to access the "cheat console" (Developer Toolbar: F12, cmd+opt+I), and pastes this snippet:

Of course the developer console ist very obvious. So with a WYSIWYG editors you can maybe exploit some functionallity that causes javascript to be executed. For example if I can enter javascript:alert(1) as a URL and it get's rendered (eg. in the Preview) as <a href="javascript:alert(1)">link</a>, I can trick a user to execute that.

[!] Note: this is only on the current page, we haven't saved our text to the server - and we don't know yet how the real output looks like. It's possible that the output is properly sanitized/purified when we submit this link as a post. This means I can trick somebody into a self-xss if I can make them following those steps.

Compare it to the original data flow. This XSS never leaves the first user:

And I believe that most of Ashar Javed's XSS are exactly this. For example his tinymce writeup sounds exactly like that (click on the image to go to his original post):

Yes it can become a problem (see facebook), but in general I consider it a very very minor issue. Not even worth reporting.

2. Exploiting the server-side script

This time we have the full flow and we store the post on the Server. But it doesn't get properly purified/encoded/sanitized for rendering in the user's browser.

2.1 Output not properly sanitized

Here is a perfect example by @StackSmashing. Protonmail uses a WYSIWYG editor (but this fact doesn't really matter). @StackSmashing then just edit's the editors generated HTML code and sends it to the server. Instead of properly sanitizing it, the code get's embedded in an email and then executed.

[!] Note: this works with any WYSIWYG editor, when the output is not sanitized. Actually it's wrong saying that this is an issue of the editor. Because whatever the editor may disallow/purify/encodes/sanitizes, an attacker can always send what he want's to the server. The output needs to be safe.

2.2 BB-Code parser

Here is where stuff actually becomes very interesting and fun :3

Some WYSIWYG editors create BBCode rather than HTML. But somewhere this BBCode has to be parsed and translated into HTML. And many people write regex parser for that - which is a horrible idea. As langsec, Chomsky hierarchy and many other examples have taught us, it's impossible to match a context-free (Type-2) language like HTML with a regular language (Type-3) like regex. Thus we can exploit those flawed regex parsers.

Easy XSS could look like this:

[img]fake.png" onerror="alert(String.fromCharCode(88,83,83))[/img]

But because of regex parsers, weird stuff like this can help you break out off attribute contexts etc.

But WYSISYG editors are a bit special, because websites that use them specifically want to allow certain HTML tags in user input. And this is a nontrivial task! But luckily other people have solved this for us already and there are projects like HTML Purifier or DOMPurify.

Evaluating Ashar Javed's solution

In his BlackHat briefing he promises us ...

... a sanitizer (very easy to use, effective and practical solution) which is based only on '11 chars + 3 regular expressions' and will show how it will safe you from an XSS in HTML, attribute, script (includes JSON context), style and URL contexts.

which seems to be this implementation, published by him in June/July this year.

It doesn't even make sense here. Because we are talking about WYSIWYG, where we want to allow certain tags. But this filter just encodes everything (read as: doesn't allow ny tags).
This doesn't help preventing all the server side parsing/purify difficulties we have with complex html.

Additionally he publishes another solution - a javascript based filter. Does this help to prevent the client-side (self-xss) issues of all those WYSIWYG editors he exploited?

This filter has so many false positives. I can't even write a simple text like: "do you know base64?".
And yeah, it allows some tags like <b>bold</b>. But it doesn't solve the more difficult challenge that HTML purifier face - allowing a lot of different tags with attributes, etc.

It's another "solution", which in reality is not a solution.

As I mentioned, I haven't watched his talk, but based on the slides it seems like he even makes fun about what the
developers say. But I have to agree with them (see 2.1 Output not properly sanitized) - because an attacker
can pass any input to the server. It doesn't matter what the capabilities of a WYSIWYG editor are.

My conclusion

Ashar Javed's is not very clear about what he actually did. I believe most of his XSS were just self-xss.
And even if it was more than that (see my overview 1. - 2.) it is still old and known stuff.

Besides that, the "solutions" he provided are not solutions for his issues. Neither on the server-side nor on the
client-/editor-side do they sanitize/purify HTML to allow harmless tags -> which is the real challenge.

I mean it's not necessarily wrong what he says. It just doesn't make a lot of sense in this context and it's not really new.

It could be a nice paper if it would include which parts he actually exploited. So that WYSIWYG editor (and backend) developers can actually
learn from the mistakes of others.

In the end I don't understand how this got accepted by the BlackHat EU reviewers...

Now I want to finish with the quote of a friend:

If you wrap it into a confusing cloud of half-true content you can get quite far

Rebuttal

To keep it fair Ashar Javed received this article as a draft to be able to comment about it beforehand.
He also gave the permission to publish his answers here, which is great for transparency.

I will not make any additional comments to what Ashar replied, because in my opinion it doesn't change anything about what I said above.

we had a look at your BlackHat presentation and paper and developed doubts about its content and reasoning.

We discussed it internally and couldn't arrive at a point where it all makes sense. That holds for both the attacks as well as the proposed defense.

We will publish a written criticism very soon but wanted to give you a chance to preview and comment this. [...]

While not written by me, I agree with all mentioned in there and believe it is right.

Your comment on that is welcome.

.mario

Ashar Javed answers:

Hi,

I can give you a point-wise or line-by-line feedback but for this I need more time.

The case study about WYSIWYG editors is a general study and it includes server-side, client-side and different programming languages WYSIWYG editors (in PHP. ASP, Rails, JavaScript and JQuery-based etc). I discussed the results in general and they are not specific to client or server side.

It is a debatable issue that client-side sanitization will be there or not ... I found Froala WYSIWYG editors developers were very keen in sanitizing stuff on the client side but on the other hand CKEditor developer said to me that it is a server-side problem. I used developers' comments in the slides not for FUN but I wanted to convey that developers of WYSIWYG editors want server-side sanitization while developers of server-side web applications take the product and start using it without adding sanitization stuff which makes the sites vulnerable. I had given examples of Twitter, CNET, Ebay etc ...

Fabian had written that bug in Tiny-MCE is not even worth reporting but my question is that why developers are keen in fixing it quickly ... For my 1000 USD bounty from Magento (BB-code in use), as far as I know, everything is happening on the server-side and for me it was a black-box text.

Down below I will try to make some points clear so that you will have a better understanding of the slides. I think the confusion arise because you had seen:

This was a demo where for the sake of demo, I used the client-side code and the regular expressions are in JavaScript.

What I had in my mind and what I wanted to convey and conveyed i.e., "see this filter for harmless tags" is still holds true if you will use the same regular expressions on the server-side.

This filter allows very simple tags like bold,, italic etc. It does not allow links and images. If you look at the Facebook's WYSIWYG editor (I really liked because it is very simple) which is available at: https://www.facebook.com/editnote.php and I mentioned in the slides also: http://slides.com/mscasharjaved/wysiwyg-editors-xssed#/172

It also allows simple tags without images and links. The pro of my filter (if used on server-side) is that it is open-source (since last two and half years because it is part of ModSecurity Core Rule Set also) though suffers from false positives (which is a common problem in filtering solutions). Facebook's WYSIWYG editor is not open-sourced but very good in a sense that it has no false positives ... In comparison, there is a trade-off.

Now discuss second potential solution (not a complete) but bits and pieces can be used by the WYSIWYG editors' developers. As an academia, we proposed different prototype solutions ... you also know that.

In a recent work, I had developed a per-context server-side filter or encoder which is based on minimalistic encoding of meta or trigger characters. It is a complete solution for an XSS protection in five contexts and I achieved the results with only 11 characters and 3 regular expressions in total. It supports five contexts, HTML, attribute, style, URL and script. I had developed the solution by keeping in mind XSS not WYSIWYG editors ...

But by keeping in mind WYSIWYG editor's functionality, as a developer one can leverage the code from three contexts that are also part of most WYSIWYG editors ...

In a similar manner, if you want to allow styling then you can use styleContextCleaner. Fabian had written that it encodes everything ... No. It only controls six characters that are necessary to execute JavaScript in style context. At the same time, it allows simple styling which I assume WYSIWYG editors want to offer (see http://slides.com/mscasharjaved/wysiwyg-editors-xssed#/162). One can also cut short six control characters into five characters if you know that you will be only using double quotes through-out your code then no need to control single quote in style context and vice-versa. Can be further shorten to four characters if you as a developer are sure that you will be using only style attribute not style tag then remove < from the list ...

For URL, if you are a WYSIWYG editor developer then I had written 3 regular expressions that only allow harmless URLs and do not allow JavaScript, Data and VbScript URI. I had discussed one out of three regular expression here: http://slides.com/mscasharjaved/wysiwyg-editors-xssed#/164 The other two regular expressions deals with mailto: and relative URI etc.

Because script context is not part of WYSIWYG editor's functionality and that's why I omitted scriptContextCleaner from the slides.

"No matter how you will code your WYSIWYG editor but if your WYSIWYG editor supports above three contexts then YOU MAY LEVERAGE MY CODE WHICH IS UNBREAKABLE and perfectly suites/fits (at least I think) in three contexts ..." My assumption is that even if they have some sort of sanitization then it may be flawed. Mine functions are thoroughly tested by the community and so far flawless ... (no one is able to XSSed these). They can use my functions as a replacement only for their sanitization routines ...

a) You say your filter is flawless and ready for deployment against XSS?

From: sanitization then it may be flawed. Mine functions are thoroughly tested by the community and so far flawless ... (no one is able to XSSed

So far flawless. But I can not guarantee about the future ... As far as deployment is concerned, it has already been deployed as an extension for Symphonycms: https://github.com/symphonycms/xssfilter. There are other products also. Once paper (under submission) will be accepted, all names will be public.

b) You say that if no server side validation is being used, the client side validation/sanitation will help?

From: [...] that developers of WYSIWYG editors want server-side sanitization while developers of server-side web applications take the product and start using it without adding sanitization stuff which makes the sites vulnerable.

Yes & No. It is debatable....

c) Existing XSS filters commonly introduce false alerts

d) Given you referenced the way academia works: Do you consider your work to be novel? You mentioned a proposed solution. Are you the first proposing this? I am asking these questions to get an understanding of what you mean and what the background of the presentation and publication is.

I see novelty ...

1) You were the first in literature who proposed a particular type of solution

e) Do you mind if we publish your replies in our article?

First of all, this research is legit because I have a logo and a name for it. This seems to be a trend right now
(heartbleed, shellshock, sandworm) . Afaik the rule is that you must invest the same time into creating the logo as you did in your research.

Creating a captcha system is not as easy as it seems. Presumably your captcha system doesn't have any implementation errors and logic flaws, you are still fighting against andvanced research in image/voice recognition. That's almost like creating your own crypto.

But if you have expected some crazy new algorithm, I have to dissapoint you. It's just another design flaw.

So this post is about yet another broken captcha, and if you are not interested in the technical part, just make sure you remember this:

But let's not beat around the bush for too long. You are reading this because you are interested in what this scary CrossedCaptcha is all about, right?

The captcha system I'm talking about is called PlusCaptcha, which you can use as a standalone script or Wordpress Plugin. For some reason it's even ecological!

How did I come up with CrossedCaptcha? Well, a plus (+) and a cross (✝) are very similar. I know that crossed isn't technically crucified, and it makes it less funny, but whatever. You are here because you are interested in the exact vulnerability and you want to know how it's done. And you have read enough technicalblah blah articles anyway.

So let's dive in...

This is how the PlusCaptcha looks like. You have to turn the circle to match the background image.

It's an embedded iframe with the url http://syshtml5.pluscaptcha.com/i?iduso=42906365. This particular instance has the id 42906365. Everytime this URL is loaded, there will be a new captcha with a different solution be generated.

When you adjust the circle, the iframe will send a POST request to http://syshtml5.pluscaptcha.com/angulo.php with data grados=-90&iduso=%d&size=c&green=0, where grados is the degree of the turned picture. It does this for every adjustment you make.

The following code is the php backend check if the captcha you have entered is correct. It does this by checking http://syshtml5.pluscaptcha.com/r?iduso=42906365:

A simple solution would be to void the captcha after the server has checked it once. But then you still have a ~14%-20% chance on beeing correct just by guessing. And because you only have a limited amount of pictures you can easily presolve them anyway. Besides that, this implementation is also not barrier free.

One day I thought about different techniques to do source code analysis. Especially since we often have access to repositories and thus the evolution of code.

Wouldn't it be cool to see the age of certain lines of code relatively to others? So I decided to create a PoC Sublime Text Plugin to visualize the age of lines. I call this method - Code Archeology

And here is the result. This is the normal syntax highlighting:

And this is an example which highlights the oldest parts of the code and darkens the newer lines:

Update 2014-09-24:

So it turns out that somebody already thought about Code Archeology long before me -
John Firebaugh - Code Archaeology With Git.
Maybe I have even read this article years ago, forgot about it, and subconsciously "created" it in my head again.

And it gets worse. Github already has colors to indicate older and newer files.

But github has only 10 different colors and the narrow coloumn doesn't really transmit the information. So to finally do something "useful" I have created a small JavaScript snippet, which you can copy into the developer console.

It parses the <time> tag of each commit and assigns colors to each line. It also removes some of the commit info to have a bigger view on the code.

Now I need to group them together and assign them a color. Unfortunately this get's really ugly in Sublime :(
To create a colored line I have to generate a theme on-the-fly with different colored regions and assign them to corresponding age groups afterwards:

Then I have to go through all my lines with Sublime views and add the corresponding region to it.

There are quite a few cons about my method. First of all you can only look at one file at a time. Because the theme is always on-the-fly generated based on the amount of groups I have, it will change the look of other open files that share the same dynamic theme.
The generation is also slow - a file with ~5k LOC takes over 10 seconds.

I think that visualizing the age of code can be very useful, but somebody has to come up with a better idea how to implement it.

The PoC plugin can be downloaded here code_archeology.py. Place the script in ~/Library/Application Support/Sublime Text 3/Packages/User. Then open a file in a git repository and press ctrl+` to open the console and run it with view.run_command("example") or to reset the view use view.run_command("example", {'reset': True}). But this should never ever ever be used by anybody. It's buggy and will probably only work on my machine. I just don't want to hold back any information.