Protecting from XSS with Sanitize

Transcripts

What's up guys? This episode we're talking about sanitizing your rails application to prevent cross-site scripting attacks. What is a cross-site scripting attack? Basically anytime you let your users type in HTML onto your website, if you take that HTML and your printed out on the response, then that will be executed onto the browser. What does an example of this look like? Well, if your user types some malicious JavaScript code in their bio, like <script>alert("hello")</script>, this could be evaluated on the client side, so what could happen is that I could put in a malicious JavaScript thing here, and then anytime someone viewed my profile, they would run that JavaScript, meaning that I could steal things like their JSON webtokens and log in as them or anything like that. Anything your browser has access to is potentially stealable by cross-site-scripting. Rails by default will automatically escape the code that you type in, and make sure that it does not render on that page. We can see that because the script tags are actually printed out on the page, and if we view source, you'll see that the script tags, the > and < tags are actually converted into html entities, so they are converted into their & equivalents, and that way, when they get printed out on the page, they're not rendered as actual JavaScript tags. This is good, this is a good default of rails. However, the second thing you want your users to be able to do things like type in an anchor tag, and you want them to be able to say: Let's create a link to their website. If you type in one of these, if you want them to be able to type in anchor tags or anything like that, well this isn't going to work. You have to figure out a way to convert this to actually render as HTML. Now, what you might have seen people do in the past is type raw here, because this will work. This will say that anything that the user bio contains, just print it out in the template as real HTML. This is bad because it's going to run that JavaScript tag of course. That's why everyone says: Do not ever use raw, and that is a good practice to follow by. Don't ever use the raw method in your view. Of course, the solution then for a lot of people has been to remove raw and then to use html_safe instead. What this will do is mark that string as html safe saying that: This was safely sanitized, we can print that out, but this is effectively doing the exact same thing as raw, and we can see that by going back to our browser and it will render the JavaScript again, and that's also just as bad. html_safe is just as dangerous as using raw and that is kind of a problem. Rails comes with some methods that you can find in their sanitize helper, called sanitize_css, which is interesting as well. You can have your users type in style tags, which are pretty interesting, you can say

If you update this, it's going to run that JavaScript of course, but it's also going to run those styles, which is pretty bad, because then your user can deface your website or do whatever they want really. If you allow either one of those, that can be bad.

If you're Shopify or some website that needs to allow your user to customize some things, you will need to allow some of this to happen specifically on purpose, so in certain cases like if you're building a forum, you might want your users to be able to type certain tags, or in the case of this bio, you might want them to be able to write links out. We need to actually use the sanitized helper in order to take that content that we have in the bio, and to remove all the style tags and the script tags. The rest of them are kind of OK. We don't really mind about most other things, but we want to make sure that this is properly sanitized. We do allow anchors and strong tags and em tags, and if we run our Bio through that sanitize method, what this is going to do is it's going to analyze the text inside of the bio, it's going to strip out anything that is invalid, or not allowed, and then it's going to mark that as HTML safe. Internally, it's marking that as HTML safe only after it sanitized it. You'll want to use sanitize and never html_safe directly unless you know exactly what your sanatizing is doing. Leave that up to sanitize, and if we go back to this, we'll now see that we get GoRails printed out, the alert text, but there's no script tags around it, and the body style as well, but there's no style tags around it. The sanitize method has stripped out both the script and the style tags. But it has allowed the anchor tags for GoRails to be inserted, as well as evaluated as an actual tag, so it didn't print it out. It actually rendered it. This is stuff you probably already familiar with, but what about links. Anytime a user can type in a link, usually their going to type in a valid one, but any link can actually be set, (the href) to a JavaScript function. You can say that by sating javascript: your_javascript. What that means is that if we update this user, if we go and look at our link_to for the user url, we're printing out a link that displays the text that they typed in as well as it links to that text. This doesn't execute as soon as the page loads like the other one. However, if you click on this link, it's going to execute that JavaScript, and that is equally as bad, except that it doesn't happen automatically on page load. It's just as dangerous to allow your link_tos to point to user generated content. That's something that I see people overlooking quite often. In this case, with the link_to, you actually want to pass it through sanitize as well. What this will do is it will render the link_to first, and then patch the link through sanitize, which case that will take the link and strip out any of the JavaScript in the url.

This time around, it looks a little bit different, and if we hover over it, it knows that it's an anchor tag, except if we click on it, it just highlights the text. The reason for that, is if we inspect this, it is just an anchor tag that has no href. The sanitize method pulled out the href, but it allowed the anchor tag to till exist. In this case, because it was a valid anchor tag, that we typed into the bio, it allows the href attribute because there was no JavaScript inside of it. That is pretty interesting, and one of the other attack vectors that you might stumble across that is pretty common in rails. I would be willing to bet that there's probably one of your applications for sure that links to user generated url's which has that security bug in it. Pretty much anytime that you're rendering out user generated content, you need to run it through sanitize, if you want to allow any of that content to be rendered as actual html. In most cases, that's going to be ok, for a name, you will not really need to do script tags here because you were probably not evaluating their name as html. As long as you never say: html_safe or raw on a persons name, then you won't have anything to worry about it. It will just convert those to the escaped characters and you will be fine with that. Be careful with link_tos as well as anytime you want to render Mardown or html from your users.

There's a gem on GitHub called sanitize which has conveniently the exact same name as the rails helper. This gem will allow you to do a much much more robust and customizable sanitization strategy. Imagine that you're Wordpress or Shopify or some site that has custom feature themes and you want your users to be able to type in scripts and links and stylesheets ad those type of things in order to customize the themes, but you don't want them to be malicious about it. Maybe you're building your own modern MySpace, and that needs to allow you to put your own theme on there, but you can't allow for certain things. What you can do is use a gem like this, and it will allow you to sanitize fragments of html, entire documents, and stylesheets and properties. This can get really really fined-grained as to your sanitization methods. You can say: Well, let's allow certain types of html, and let's allow certain stylesheet tags as well, and you can build your own custom sanitizers using this gem and it works very very well. We're not going to go into using that sanitized gem. I just want to let you know that it's available, and that is something that you can dive deeper into if you need that, that goes over and above the built-in rails sanitizer. Definitely recommend reading about cross-site-scripting on Wikipedia, or OWasp. They have incredibly good resources and show you very interesting examples of how your users can inject pretty dangerous things. This is particularly important because as you're building applications, if you're doing anything modern like JSON web tokens to authenticate your Angular or React front-ends. Those json webtokens need to be stored somewhere in the browser, and if someone is allowed to put script tags on your website. They can potentially steal your JSON webtokens, meaning that your users accounts can just get stolen left and right like no problem and other people can access to their accounts.

That's a quick introduction to the sanitize method here in Rails. cross-site-scripting is an easy one to accidentally get wrong, and can be very dangerous if that happens. Be very careful with this, and if you can, set up automated tests in order to automatically attempt to put in JavaScript into your application and make sure that your code and your tests handle that appropriately, so that this is not going to be a problem for you in production.

I wanted to cover this before we go into API's and json web tokens and securing your production servers and all that stuff. This is one of those things that's easy to implement and easy to forget about, and so I wanted to start this off by talking about cross-site-scripting and sanitizing those inputs. That's it for this episode, I will talk to you in the next one. Peace