General Security

Last weekend, along with countless employees and ex-employees of Microsoft, Amazon, Expedia, and Premera itself, I received a breach notification signed by Premera’s President & CEO, Jeffrey Roe.

Here’s a few things I think can already be learned from this letter and the available public information:

Don’t claim “sophisticated”

Whenever I see the phrase “sophisticated cyberattack”, not only am I turned off by the meaningless prefix “cyber”, which seems to serve only to “baffle them with bullshit”, but I’m also convinced that the author is trying to convince me that, hey, this attack was just too amazing and science-fictiony to do anything to stop.

All that does is push me in the other direction – to assume that the attack was relatively simple, and should have been prevented and/or noticed.

Granted, my experience is in Information Security, and so I’m always fairly convinced that it’ll be the simple attacks, not the complex and difficult ones, that will be the most successful against any site I’m trying to protect. It’s a little pessimistic, but it’s been proven right time and again.

So, never say that an attack is “sophisticated” unless you really mean that the attack was way beyond what could have been reasonably imagined. You don’t have to say the attackers used simple methods to get in because your team are idiots, because that’s unlikely to be entirely true, either. Just don’t make it sound like it’s not your fault. And don’t make that your opening gambit, either – this was the very first sentence in Premera’s notification.

Don’t say “may” / “might”, say “was”

“some of your personal information may have been accessed”

Again, this phrasing simply makes me think “these guys have no idea what was accessed”, which really doesn’t inspire confidence.

Instead, you should say “the attackers had access to all our information, including your personal and medical data”. Then acknowledge that you don’t have tracking on what information was exported, so you have to act as if it all was.

Say “sorry we allowed your data to be lost”

The worst apologies on record all contain some variation of “I’m sorry you’re upset”, or “I’m sorry you took offence”.

Premera’s version of this is “We … regret the concern it may cause”. So, not even “sorry”. And to the extent that it’s an apology at all, it is that we, current and past customers, were “concerned”.

Parenthetical abbreviations mean this was written by a lawyer

Premera Blue Cross (“Premera”) …

… Information Technology (IT) systems

As if the lack of apology didn’t already tip us off that this document was prepared by a lawyer, the parenthetical creation of abbreviations to be used later on makes it completely clear.

If the letter had sounded more human, it would have been easier to receive as something other than a legal arse-covering exercise.

Speed is important

The letter acknowledges that the issue was discovered on January 29, 2015, and the letter is dated March 17, 2015. That’s nearly two months. And nearly a year since the attackers got in. That’s assuming that you’ve truly figured out the extent of the “sophisticated cyberattack”.

Actually, that’s pretty fast for security breach disclosure, but it still gives the impression to customers that you aren’t letting them know in enough time to protect themselves.

The reason given for this delay is that Premera wanted to ensure that their systems were safe before letting other attackers know about the issue – but it’s generally a fallacy to assume that attackers don’t know about your vulnerabilities. Premera, and the health insurance industry, do a great job of sharing security information with other health insurance providers – but the attackers do an even better job of sharing information about vulnerable systems and tools.

Which leads us to…

Preparation is key

If your company doesn’t have a prepared breach disclosure letter, approved by public relations, the security team and your lawyers, it’s going to take you at least a week, probably two, to put one together. And you’ll have missed something, because you’re preparing it in a rush, in a panic, and in a haze while you’re angry and scared about having been attacked.

Your prepared letter won’t be complete, and won’t be entirely applicable to whatever breach finally comes along and bites you, but it’ll put you that much closer to being ready to handle and communicate that breach. You’ll still need to review it and argue between Security, Legal and PR teams.

Have a plan for this review process, and know the triggers that will start it. Maybe even test the process once in a while.

If you believe that breaches could require a credit notification or ID tracking protection, negotiate this ahead of time, so that this will not slow you down in your announcement. Or write your notification letter with the intent of providing this information at a later time.

Finally, because your notification letter will miss something, make sure it includes the ability to update your customers – link to an FAQ online that can be updated, and provide a call-in number for people to ask questions of an informed team of responders.

More to come

There’s always more information coming out about this vulnerability, and I plan to blog a little more about it later.

Let me know in particular if there’s something you’d like me to cover on this topic.

What it isn’t – the login page

And of course, the trick here is to supply the user ID “admin” and the password “' OR 1='1”.

Sure, IF you have that code in your app, that will let the user in as admin.

But then, IF you have that code in your app, you have many bigger problems than SQL injection – because your user database contains unprotected passwords, and a leak will automatically tell the world how poor your security is, and always has been.

More likely, if you have SQL injection in the logon code at all, is that you will have code like this:

Again, if you were to have designed poorly, you might allow for multiple user records to come back (suppose, say, you allow the user to reuse old passwords, or old hashing algorithms), and you accept the first account with the right password. In that case, yes, an attacker could hack the login page with a common password, and the user ID “' OR userid LIKE '%” – but then the attacker would have to know the field was called userid, and they’re only going to get the first account in your database that has that password.

Doubtless there are many login pages which are vulnerable to SQL injection attacks like this, but they are relatively uncommon where developers have some experience or skill.

So if not on the login page, where do we see SQLi?

Where do you use a SQL-like database?

Anywhere there’s a table of data to be queried, whether it’s a dictionary of words, or a list of popular kitchen repair technicians, etc, etc.

Imagine I’ve got a dictionary searching page, weblexicon.example (that doesn’t exist, nor does weblexicon.com). Its main page offers me a field to provide a word, for which I want to see the definition.

If I give it a real word, it tells me the definition(s).

If I give it a non-existent word, it apologises for being unable to help me.

Seems like a database search is used here. Let’s see if it’s exploitable, by asking for “example’” – that’s “example” with an extra single quote at the end.

That’s pretty cool – we can tell now that the server is passing our information off to a MySQL server. Those things that look like double-quotes around the word ‘example’ are in fact two single-quotes. A bit confusing, but it helps to understand what’s going on here.

So, let’s feed the web lexicon a value that might exploit it. Sadly, it doesn’t accept multiple commands, and gives the “You have an error in your SQL syntax” message when I try it.

Worse still, for some reason I can’t use the “UNION” or “JOIN” operators to get more data than I’m allowed. This seems to be relatively common when there are extra parentheses, or other things we haven’t quite guessed about the command.

I’m blind!

That means we’re stuck with Blind SQL injection. With a blind SQL injection, or Blind SQLi, you can generally see whether a value is true or false, by the response you get back. Remember our comparison of a word that does exist and a word that doesn’t? Let’s try that in a query to look up a true / false value:

So now, we can ask true / false questions against the database.

Seems rather limiting.

Let’s say we’re looking to see if the MySQL server is running a particular vulnerable version – we could ask for “example’ and @@version=’1.2.3.4” – a true response would give us the hint that we can exploit that vulnerability.

LIKE my ‘sploit

But the SQL language has so many other options. We can say “does your version number begin with a ‘4’”

Or 5.

A bit more exciting, but still pedestrian.

What if I want to find out what the currently executing statement looks like? I could ask “is it an ‘a’? a ‘b’? a ‘c’?” and so on, but that is too slow.

Instead, I could ask for each bit of the characters, and that’s certainly a good strategy – but the one I chose is to simply do a binary search, which is computationally equivalent.

What language, that’s the question…

A fifteen-year-old vulnerability (SQL injection is older than that, but I couldn’t do the maths) deserves the same age of language to write my attack in.

So I chose batch file and VBScript (OK, they’re both older than 15). Batch files can’t actually download a web page, so that’s the part I wrote in VBScript.

And the fun thing to dump would be all of the table names. That way, we can see what we have to play with.

So here you go, a simple batch script to do Blind Boolean SQL injection to list all the tables in the system.

60 lines? What, that’s it?

Yes, that’s all it takes.

If you’re a developer of a web app which uses a relational database back-end, take note – it’s exactly this easy to dump your database contents. A few changes to the batch file, and I’m dumping column names and types, then individual items from tables.

And that’s all assuming I’m stuck with a blind SQL injection.

The weblexicon site lists table contents as its definitions, so in theory I should be able to use a UNION or a JOIN to add data from other tables into the definitions it displays. It’s made easier by the fact that I can also access the command I’m injecting into, by virtue of MySQL including that in a process table.

Note that if I’m attacking a different site with a different injection point, I need to make two changes to my batch script, and I’m away. Granted, this isn’t exactly sqlmap.py, but then again, sqlmap.py doesn’t always find or exploit all the vulns that you have available.

Summary

The takeaways today:

SQLi is a high damage attack – an attacker can probably steal ALL your data from your databases, not just the data exposed on a page, and they may be able to modify ALL your data.

SQLi is easy to find – there are lists of thousands of vulnerable websites available for anyone to view. And that’s without looking on the “Dark Web”. Also, any data-handling web site is a great target to try.

SQLi is easy to exploit – a few lines of an archaic language are all it takes.

SQLi is easy to fix. Parameterised queries, input validation, access control and least-privilege combine (don’t use just one!) to protect your site.

Disclaimer

The code in this article is for demonstration purposes – I’m not going to explain how it works, although it is very simple. The only point of including it is to show that a small amount of code can be the cause of a huge extraction of your site’s data, but can be prevented by a small change.

Don’t use this code to do bad things. Don’t use other code to do bad things. Bad people are doing bad things with code like this (and better) already. Do good things with this code, and keep those bad people out.

Hopefully, you’ll all know by now what Heartbleed is about. It’s not a virus, it’s a bug in a new feature that was added to a version of OpenSSL, wasn’t very well checked before making it part of the standard build, and which has for the last couple of years meant that any vulnerable system can have its running memory leached by an attacker who can connect to it. I have a number of approaches to make to this, which I haven’t seen elsewhere:

Behavioural Changes to Prevent “the next” Heartbleed

You know me, I’m all about the “defence against the dark arts” side of information security – it’s fun to attack systems, but it’s more interesting to be able to find ways to defend.

Here are some of my suggestions about programming practices that would help:

Don’t adopt new features into established protocols without a clear need to do so. Why was enabling the Heartbeat extension a necessary thing to foist on everyone? Was it a MUST in the RFC? Heartbeat, “keep-alive” and similar measures are a waste of time on most of the Internet’s traffic, either because the application layer already keeps up a constant communication, or because it’s easy to recover and restart. Think very carefully before adopting as mandatory a new feature into a security related protocol.

This was not a security coding bug, it was a secure coding bug, but in a critical piece of security code. Secure code practices in general should be a part of all developers’ training and process, much like hand-washing is important for doctors whether they’re performing surgery or checking your throat for swelling. In this case, the code submitted should have tripped a “this looks odd” sense in the reviewer, combined with its paucity of comments and self-explanation (but then, OpenSSL is by and large that way anyway).

Check lengths of buffers. When data is structured, or wrapped in layers, like SSL records are, transform it back into structures and verify at each layer. I’m actually trying to say write object-oriented code to represent objects – whether the language is object-oriented or not. [It’s a matter of some pride for me that I reviewed some of the Fortran code I wrote as a student back in the mid eighties, and I can see object-orientedness trying to squeeze its way out.]

Pay someone to review code. Make it their job, pay them well to cover the boredom, and hold them responsible for reviewing it properly. If it matters enough to be widely used, it matters enough to be supported.

Unit tests. And then tests written by a QA guy, who’s trying to make it fail.

There’s just a few ideas off the top of my head. It’s true that this was a HARD bug to find in automated code review, or even with manual code review (though item 2 above tells you that I think the code looked perverse enough for a reviewer to demand better, cleaner code that could at least be read).

National Security reaction – inappropriate!

Clearly, from the number of sites (in all countries) affected negatively by this flaw, from the massive hysteria that has resulted, as well as the significant thefts disclosed to date, this bug was a National Security issue.

So, how does the US government respond to the allegations going around that they had knowledge of this bug for a long time?

By denying the allegations? By asserting they have a mandate to protect?

No, by reminding us that they’ll protect US (and world) industries UNLESS there’s a benefit to spying in withholding and exploiting the bug.

“You are not going to see the Chinese give up on ‘zero days’ just because we do.”

No, you’re going to see “the Chinese” [we always have to have an identifiable bogeyman] give up on zero days when our response to finding them is to PATCH THEM, not hold them in reserve to exploit at our leisure.

Specifically, if we patch zero days when we find them, those weapons disappear from our adversaries’ arsenals.

If we hold on to zero days when we find them, those weapons are a part of our adversaries’ arsenals (because the bad guys share better than the good guys).

National Security officials should recognise that in cyberwar – which consists essentially of people sending postcards saying “please hit yourself” to one another, and then expressing satisfaction when the recipient does so – you win by defending far more than by attacking.

Many eyeballs review is surprisingly incomplete

It’s often been stated that “many eyeballs” review open source code, and as a result, the reviews are of implicitly better quality than closed source code.

Clearly, OpenSSL is an important and widely used piece of security software, and yet this change was, by all accounts, reviewed by three people before being published and widely deployed. Only one of those people works full time for OpenSSL, and another was the author of the feature in question.

There are not “many” eyeballs working on this review. Closed source will often substitute paid eyeballs for quantity of eyeballs, and as a result will often achieve better reviews.

Remember, it’s the quality of the review that counts, and not whether the source is closed or open.

Closed source that is thoroughly reviewed by experts is better than open source that’s barely reviewed at all.

At some point, you realise that this is a waste of paper, so you start writing your messages on a whiteboard.

Wiping the whole whiteboard for each message is a waste of effort, so you only wipe out enough space to write each incoming message.

Some messages are long, some are short.

One day, you are asked to read a message back to the person who leaves it, just after you wrote it.

And to make it easy, they tell you how long their message is.

If someone gave you a six letter message, and asked you to read all six hundred letters of it back to them, you’d be upset, because that’s not how many letters they gave you.

Computers aren’t so smart, they are just really fast idiots.

The computer doesn’t get surprised that you sent six characters and ask for six hundred back, so it reads off the entire whiteboard, containing bits and pieces of every message you’ve had sent through you.

And because most messages are small, and only some are large, there’s almost an entire message in each response.

Don’t they already have one of those?

Well, yeah, kind of. There’s the TAM Threat Analysis & Modeling Tool, which is looking quite creaky with age now, and which I never found to be particularly usable (though some people have had success with it, so I’m not completely dismissive of it). Then there’s the previous versions of the SDL Threat Modeling Tool.

These have had their uses – and certainly it’s noticeable that when I work with a team of developers, one of whom has worked at Microsoft, it’s encouraging to ask “show me your threat model” and have them turn around with something useful to dissect.

So what’s wrong with the current crop of TM tools?

In a word, Cost.

Threat modeling tools from other than Microsoft are pretty pricey. If you’re a government or military contractor, they’re probably great and wonderful. Otherwise, you’ll probably draw your DFDs in PowerPoint (yes, that’s one of the easier DFD tools available to most of you!), and write your threat models in Word.

Unless, of course, you download and use the Microsoft SDL Threat Modeling Tool, which has always been free.

So where’s the cost?

The SDL TM tool itself was free, but it had a rather significant dependency.

Visio.

Visio is not cheap.

As a result, those of us who championed threat modeling at all in our enterprises found it remarkably difficult to get approval to use a free tool that depended on an expensive tool that nobody was going to use.

What’s changed today?

With the release of Microsoft SDL Threat Modeling Tool 2014, Microsoft has finally delivered a tool that allows for the creation of moderately complex DFDs (you don’t want more complex DFDs than that, anyway!), and a threat library-based analysis of those DFDs, without making it depend on anything more expensive or niche than Windows and .NET. [So, essentially, just Windows.]

Yes, that means no Visio required.

Is there anything else good about this new tool?

A quick bullet list of some of the features you’ll like, besides the lack of Visio requirement:

Imports from the previous SDL Threat Modeling Tool (version 3), so you don’t have to re-work

Multiple diagrams per model, for different levels of DFD

Analysis is per-interaction, rather than per-object [scary, but functionally equivalent to per-object]

The file format is XML, and is reasonably resilient to modification

Objects and data flows can represent multiple types, defined in an XML KnowledgeBase

These types can have customised data elements, also defined in XML

The rules about what threats to generate are also defined in XML

[These together mean an enterprise can create a library of threats for their commonly-used components]

Trust boundaries can be lines, or boxes (demonstrating that trust boundaries surround regions of objects)

Currently supported by a development team who are responsive to feature requests

Call to Action?

Yes, every good blog post has to have one of these, doesn’t it? What am I asking you to do with this information?

Download the tool. Try it out on a relatively simple project, and see how easy it is to generate a few threats.

Once you’re familiar with the tool, visit the KnowledgeBase directory in the tool’s installation folder, and read the XML files that were used to create your threats.

Add an object type.

Add a data flow type.

Add custom properties that describe your custom types.

Use those custom properties in a rule you create to generate one of the common threats in your environment.

Work with others in your security and development teams to generate a good threat library, and embody it in XML rules that you can distribute to other users of the threat modeling tool in your enterprise.

Document and mitigate threats. Measure how successful you are, at predicting threats, at reducing risk, and at impacting security earlier in your development cycle.

Unless you are putting a URL into an attribute, there are three simple rules:

Every attribute’s value must be quoted, whether with single quotes or double quotes.

If the quote you use appears in the attribute value, it must be encoded.

You must encode any characters which could confuse the encoding. [Encode the encoding characters]

Seems easy, right?

This is all kinds of good, except when you run into a site where the developer hasn’t really thought about their encoding very well.

You see, HTML attribute values are encoded using HTML encoding, not C++ encoding.

To HTML, the back-slash has no particular meaning.

I see this all the time – I want to inject script, but the site only lets me put user data into an attribute value:

<meta name="keywords" content="Wot I searched for">

That’s lovely. I’d like to put "><script>prompt(1)</script> in there as a proof of concept, so that it reads:

<meta name="keywords" content=""><script>prompt(1)</script>">

The dev sees this, and cuts me off, by preventing me from ending the quoted string that makes up the value of the content attribute:

<meta name="keywords" content="\"><script>prompt(1)</script>">

Nice try, Charlie, but that back-slash, it’s just a back-slash. It means nothing to HTML, and so my quote character still ends the string. My prompt still executes, and you have to explain why your ‘fix’ got broken as soon as you released it.

Oh, if only you had chosen the correct HTML encoding, and replaced my quote with “&quot;” [and therefore, also replace every “&” in my query with “&amp;”], we’d be happy.

And this, my friends, is why every time you implement a mitigation, you must test it. And why you follow the security team’s guidance.

Exercise for the reader – how do you exploit this example if I don’t encode the quotes, but I do strip out angle brackets?

Context – Apple releases security fix; everyone sees what they fixed

Last week, Apple released a security update for iOS, indicating that the vulnerability being fixed is one that allows SSL / TLS connections to continue even though the server should not be authenticated. This is how they described it:

Impact: An attacker with a privileged network position may capture or modify data in sessions protected by SSL/TLS

Description: Secure Transport failed to validate the authenticity of the connection. This issue was addressed by restoring missing validation steps.

Secure Transport is their library for handling SSL / TLS, meaning that the bulk of applications written for these platforms would not adequately validate the authenticity of servers to which they are connected.

Ignore “An attacker with a privileged network position” – this is the very definition of a Man-in-the-Middle (MITM) attacker, and whereas we used to be more blasé about this in the past, when networking was done with wires, now that much of our use is wireless (possibly ALL in the case of iOS), the MITM attacker can easily insert themselves in the privileged position on the network.

The other reason to ignore that terminology is that SSL / TLS takes as its core assumption that it is protecting against exactly such a MITM. By using SSL / TLS in your service, you are noting that there is a significant risk that an attacker has assumed just such a privileged network position.

Also note that “failed to validate the authenticity of the connection” means “allowed the attacker to attack you through an encrypted channel which you believed to be secure”. If the attacker can force your authentication to incorrectly succeed, you believe you are talking to the right server, and you open an encrypted channel to the attacker. That attacker can then open an encrypted channel to the server to which you meant to connect, and echo your information straight on to the server, so you get the same behaviour you expect, but the attacker can see everything that goes on between you and your server, and modify whatever parts of that communication they choose.

So this lack of authentication is essentially a complete failure of your secure connection.

As always happens when a patch is released, within hours (minutes?) of the release, the patch has been reverse engineered, and others are offering their description of the changes made, and how they might have come about.

In this case, the reverse engineering was made easier by the availability of open source copies of the source code in use. Note that this is not an intimation that open source is, in this case, any less secure than closed source, because the patches can be reverse engineered quickly – but it does give us a better insight into exactly the code as it’s seen by Apple’s developers.

Yes, that’s a second “goto fail”, which means that the last “if” never gets called, and the failure case is always executed. Because of the condition before it, however, the ‘fail’ label gets executed with ‘err’ set to 0.

Initial reaction – lots of haha, and suggestions of finger pointing

So, of course, the Internet being what it is, the first reaction is to laugh at the clowns who made such a simple mistake, that looks so obvious.

T-shirts are printed with “goto fail; goto fail;” on them. Nearly 200 have been sold already (not for me – I don’t generally wear black t-shirts).

But really, these are smart guys – “be smarter” is not the answer

This is SSL code. You don’t get let loose on SSL code unless you’re pretty smart to begin with. You don’t get to work as a developer at Apple on SSL code unless you’re very smart.

Clearly “be smart” is already in evidence.

There is a possibility that this is too much in evidence – that the arrogance of those with experience and a track record may have led these guys to avoid some standard protective measures. The evidence certainly fits that view, but then many developers start with that perspective anyway, so in the spirit of working with the developers you have, rather than the ones you theorise might be possible, let’s see how to address this issue long term:

Here’s my suggested answers – what are yours?

Enforce indentation in your IDE / check-in process

OK, so it’s considered macho to not rely on an IDE. I’ve never understood that. It’s rather like saying how much you prefer pounding nails in with your bare fists, because it demonstrates how much more of a man you are than the guy with a hammer. It doesn’t make sense when you compare how fast the job gets done, or the silly and obvious errors that turn up clearly when the IDE handles your indenting, colouring, and style for you.

Yes, colouring. I know, colour-blind people exist – and those people should adjust the colours in the IDE so that they make sense. Even a colour-blind person can get shade information to help them. I know syntax colouring often helps me spot when an XSS injection is just about ready to work, when I would otherwise have missed it in all the surrounding garbage of HTML code. The same is true when building code, you can spot when keywords are being interpreted as values, when string delimiters are accidentally unescaped, etc.

The same is true for indentation. Indentation, when it’s caused by your IDE based on parsing your code, rather than by yourself pounding the space bar, is a valuable indication of program flow. If your indentation doesn’t match control flow, it’s because you aren’t enforcing indentation with an automated tool.

What the heck, enforce all kinds of style

Your IDE and your check-in process are a great place to enforce style standards to ensure that code is not confusing to the other developers on your team – or to yourself.

A little secret – one of the reasons I’m in this country in the first place is that I sent an eight-page fax to my bosses in the US, criticising their programming style and blaming (rightly) a number of bugs on the use of poor and inconsistent coding standards. This was true two decades ago using Fortran, and it’s true today in any number of different languages.

The style that was missed in this case – put braces around all your conditionally-executed statements.

I have other style recommendations that have worked for me in the past – meaningful variable names, enforced indenting, maximum level of indenting, comment guidelines, constant-on-the-left of comparisons, don’t include comparisons and assignments in the same line, one line does one thing, etc, etc.

Make sure you back the style requirements with statements as to what you are trying to do with the style recommendation. “Make the code look the same across the team” is a good enough reason, but “prevent incorrect flow” is better.

Make sure your compiler warns on unreachable code

gcc has the option “-Wunreachable-code”.

gcc disabled the option in 2010.

gcc silently disabled the option, because they didn’t want anyone’s build to fail.

This is not (IMHO) a smart choice. If someone has a warning enabled, and has enabled the setting to produce a fatal error on warnings, they WANT their build to fail if that warning is triggered, and they WANT to know when that warning can no longer be relied upon.

So, without a warning on unreachable code, you’re basically screwed when it comes to control flow going where you don’t want it to.

Compile with warnings set to fatal errors

And of course there’s the trouble that’s caused when you have dozens and dozens of warnings, so warnings are ignored. Don’t get into this state – every warning is a place where the compiler is confused enough by your code that it doesn’t know whether you intended to do that bad thing.

Let me stress – if you have a warning, you have confused the compiler.

This is a bad thing.

You can individually silence warnings (with much comments in your code, please!) if you are truly in need of a confusing operation, but for the most part, it’s a great saving on your code cleanliness and clarity if you address the warnings in a smart and simple fashion.

Don’t over-optimise or over-clean your code

The compiler has an optimiser.

It’s really good at its job.

It’s better than you are at optimising code, unless you’re going to get more than a 10-20% improvement in speed.

Making code shorter in its source form does not make it run faster. It may make it harder to read. For instance, this is a perfectly workable form of strstr:

What’s its memory usage? Processor usage? How would you change it to make it work on case-insensitive comparisons? Does it overflow buffers?

Better still: does it compile to smaller or more performant code, if you rewrite it so that an entry-level developer can understand how it works?

Now go and read the implementation from your CRT. It’s much clearer, isn’t it?

Release / announce patches when your customers can patch

Releasing the patch on Friday for iOS and on Tuesday for OS X may have actually been the correct move – but it brings home the point that you should release patches when you maximise the payoff between having your customers patch the issue and having your attackers reverse engineer it and build attacks.

Make your security announcements findable

Where is the security announcement at Apple? I go to apple.com and search for “iOS 7.0.6 security update”, and I get nothing. It’d be really nice to find the bulletin right there. If it’s easier to find your documentation from outside your web site than from inside, you have a bad search engine.

Finally, a personal note

People who know me may have the impression that I hate Apple. It’s a little more nuanced than that.

I accept that other people love their Apple devices. In many ways, I can understand why.

I have previously owned Apple devices – and I have tried desperately to love them, and to find why other people are so devoted to them. I have failed. My attempts at devotion are unrequited, and the device stubbornly avoids helping me do anything useful.

Instead of a MacBook Pro, I now use a ThinkPad. Instead of an iPad (remember, I won one for free!), I now use a Surface 2.

I feel like Steve Jobs turned to me and quoted Dr Frank N Furter: “I didn’t make him for you.”

So, no, I don’t like Apple products FOR ME. I’m fine if other people want to use them.

This article is simply about a really quick and easy example of how simple faults cause major errors, and what you can do, even as an experienced developer, to prevent them from happening to you.

Update 2 – NOT FIXED

Yeah, so, I was apparently deluded, the problem is still here. It appears to be a bona-fide bug in Windows 8, with a Hotfix at http://support.microsoft.com/kb/2797356 – but that’s only for x86 versions of Windows, and not for the Surface 2.

Update – FIXED

Since I wrote this article, another issue caused me to reset my WMI database, by deleting everything under C:\Windows\System32\wbem\Repository and rebooting. After that, the VPN issues documented in this article have gone away.

Original article

I have a home VPN – everyone should, because it makes for securable access to your home systems when you are out and about, whether it’s at the Starbucks down the street, or half way across the world, like I was on my trip to China last week.

Useful as my home VPN is, and hard as it is to get working (see my last post on Windows 8 VPN problems), it’s only useful if I can get my entire computer to talk through the VPN.

Sidebar – VPN split tunneling

Note that I am not disputing the value of split tunneling in a VPN, which is where you might set up your client to use the VPN only for a range of addresses, so that (for example) a computer might connect to the VPN for connections to a work intranet, but use the regular connectivity for the major part of the public web. For this article, assume I want everything but my link-local traffic to be forwarded to my VPN.

So, in my last VPN post, we talked about setting up the client end of a VPN, and now I want to use it.

Connecting is the easy part, and once connected, most of my apps on the Surface 2 work quite happily, connecting to the Internet through my VPN.

All of the Desktop apps seem to work without restriction, but there are some odd gaps when it comes to using “Windows Store” apps, also known as “Metro” or “Modern UI” apps. Microsoft can’t call this “Metro” any more, even though that’s the most commonly used term for it, so I’ll follow their lead and call this the “Modern UI” [where UI stands for User Interface].

Most glaring of all is the Modern UI Internet Explorer, which doesn’t seem to allow any connections at all, simply displaying “This page can’t be displayed”. The exception to this is if I connect to a web server that is link-local to the VPN server.

I’d think this was a problem with the way I had set up my VPN server, or my client connection, if it weren’t for the fact that my Windows 8.1 laptop connects correctly to this same VPN with no issues on Modern or Desktop versions of Internet Explorer, and of course the undeniable feature that Internet Explorer for the Desktop on my Surface 2 also works correctly.

I’d like to troubleshoot and debug this issue, but of course, the only troubleshooting tools for networking in the Surface 2 run on the Desktop, and therefore work quite happily, as if nothing is wrong with the network. And from their perspective, this is true.

When Bagpuss goes to sleep, all his little friends go to sleep, too.

Of course, Internet Explorer has always been claimed by Microsoft to be a “part of the operating system”, and in Windows 8.1 RT, there is no difference in this respect.

Every Modern UI application which includes a web control, web view, or in some way asks the operating system or development framework to host a web page, also fails to reach its intended target through the VPN.

Technical Support – what’s their take?

Technical support had me try a number of things, including resetting the system, but none of their suggestions had any effect. Eventually I found a tech support rep who told me this is a bug, not that that is really what you’d call a resolution of my problem. These are the sort of things that make it clear that the Surface is still in its early days, and while impressive, has a number of niggling issues that need “fit and finish” work before significant other features get added.

It should be easy enough to set up a VPN in Windows, and everything should work well, because Microsoft has been doing these sorts of things for some years.

Sure enough, if you open up the Charms bar, choose Settings, Change PC Settings, and finally Network, you’re brought to this screen, with a nice big friendly button to add a VPN connection. Tapping on it leads me to the following screen:

No problems, I’ve already got these settings ready to go.

Probably not the best to name my VPN settings “New VPN”, but then I’m not telling you my VPN endpoint. So, let’s connect to this new connection.

So far, so good. Now it’s verifying my credentials…

And then we should see a successful connection message.

Not quite. For the search engines, here’s the text:

Error 860: The remote access connection completed, but authentication failed because of an error in the certificate that the client uses to authenticate the server.

This is upsetting, because of course I’ve spent some time setting the certificate correctly (more on that in a later post), and I know other machines are connecting just fine.

I’m sure that, at this point, many of you are calling your IT support team, and they’re reminding you that they don’t support Windows 8 yet, because some lame excuse about ‘not yet stable, official, standard, or Linux”.

Don’t take any of that. Simply open the Desktop.

What? Yes, Windows 8 has a Desktop. And a Command Prompt, and PowerShell. Even in the RT version.

Oh, uh, yeah, back to the instructions.

Forget navigating the desktop, just do Windows-X, and then W, to open the Network Connections group, like this:

Select the VPN network you’ve created, and select the option to “Change settings of this connection”:

In the Properties window that pops up, you need to select the Security tab:

OK, so that’s weird. The Authentication Group Box has two radio buttons – but neither one is selected. My Grandma had a radio like that, you couldn’t tell what station you were going to get when you turn it on – and the same is generally true for software. So, we should choose one:

It probably matters which one you choose, so check with your IT team (tell them you’re connecting from Windows 7, if you have to).

Then we can connect again:

……

And… we’re connected.

Now for another surprise, when you find that the Desktop Internet Explorer works just fine, but the “Modern UI” (formerly known as “Metro”) version of IE decides it will only talk to sites inside your LAN, and won’t talk to external sites. Oh, and that behavior is extended to any Metro app that embeds web content.

There is no such thing as “small sample code”, every sample you publish is an SDK of its own

OK, so the choice of calling these “SDKs” is rooted in my Microsoft dev background, where “sample code” didn’t need documentation or bug tracking, whereas an SDK does. You can adjust the terminology to suit.

The basic point here is to remind you that you do not get to abrogate all responsibility by saying “this is sample code, you will need to add error checking and security”, even if you do say it in the article – even if you say it in the comments of the sample!

Why do I care so much? It’s only three lines of code!

Simply stated, I’ve seen too many cases where people have included three lines of code (or five, or twenty, the count doesn’t matter) into a program, and they’ve stepped away and shipped that code.

“It wasn’t my fault,” they say, when the incident happens, “I copied that code from a sample online.”

This is the point at which the re-education machine is engaged – because, of course, it totally is your fault, if you include code in your development without treating it with the same rigour as if you had written every line of it yourself. You will get punished – usually by having to stay late and fix it.

It’s also the sample writer’s fault.

He gave you the mini-SDK that you imported blindly into your application, without testing it, without checking errors in it, without appropriate security measures, and he brushed you off with “well, of course, you should add your own error checks and security magic to it”.

Here’s an example of what I’m talking about, courtesy of Troy Hunt linking to an ASP forum.

No, if you’re providing sample code on the Internet, it’s important to make sure it doesn’t embody BAD design; this is code that will be taken up by people by definition less keen, less eager, less smart and less motivated to do things right than you are – after all, rather than figuring out how to write this code for themselves, they are allowing you to do it for them, to teach them how it’s done. If you then teach them how it’s done badly, that’s how they will learn to do it – badly. And they will teach others.

So, instead, make your three line samples five lines, and add enough error checking that unexpected issues or other bad things will break the sample’s execution.

Oh yeah, and what about updates, when you find a horrendous bug – how do you distribute those?

I’ve been playing a lot lately with cross-site scripting (XSS) – you can tell that from my previous blog entries, and from the comments my colleagues make about me at work.

Somehow, I have managed to gain a reputation for never leaving a search box without injecting code into it.

And to a certain extent, that’s deserved.

But I always report what I find, and I don’t blog about it until I’m sure the company has fixed the issue.

So, coffee store, we’re talking Starbucks, right?

Right, and having known a few people who’ve worked in the Starbucks security team, I was surprised that I could find anything at all.

Yet it practically shouted at me, as soon as I started to inject script:

Well, there’s pretty much a hint that Starbucks have something in place to prevent script.

But it’s not the only thing preventing script, as I found with a different search:

So, one search takes me to an “oops” page, another takes me to a page telling me that nothing happened – but without either one executing the script.

The oops page doesn’t include any of my script, so I don’t like that page – it doesn’t help my injection at all.

The search results page, however, that includes some of my script, so if I can just make that work for me, I’ll be happy.

Viewing source is pretty helpful, so here’s what I get from that, plus searching for my injected script:

So, while my intended JavaScript, “"-prompt(1)-"”, is not executed, and indeed is in the wrong context to be executed, every character has successfully made it into the source sent back to the user’s browser.

At this point, I figure that I need to find some execution that is appropriate for this context.

Maybe the XSS fish will help, so I search for that:

Looks promising – no “oops”, let’s check the source:

This is definitely working. At this point, I know the site has XSS, I just have to demonstrate it. If I was a security engineer at Starbucks, this would be enough to cause me to go beat some heads about.

I think I should stress that. If you ever reach this point, you should fix your code.

This is enough evidence that a site has XSS issues to make a developer do some work on fixing it. I have escaped the containing quotes, I have terminated/escaped the HTML tag I was in, and I have started something like a new tag. I have injected into your page, and now all we’re debating about is how much I can do now that I’ve broken in.

And yet, I must go on.

I have to go on at this point, because I’m an external researcher to this company. I have to deliver to them a definite breach, or they’ll probably dismiss me as a waste of time.

The obvious thing to inject here is “"><script>prompt(1)</script>” – but we saw earlier that produced an “oops” page. We’ve seen that “prompt(1)” isn’t rejected, and the angle-brackets (chevrons, less-than / greater-than signs, etc, whatever you want to call them) aren’t rejected, so it must be the word “script”.

That, right there, is enough to tell me that instead of encoding the output (which would turn those angle-brackets into “&lt;” and “&gt;” in the source code, while still looking like angle-brackets in the display), this site is using a blacklist of “bad words to search for”.

Why is a blacklist wrong?

That’s a really good question – and the basic answer is because you just can’t make most blacklists complete. Only if you have a very limited character set, and a good reason to believe that your blacklist can be complete.

A blacklist that might work is to say that you surround every HTML tag’s attributes with double quotes, and so your blacklist is double quotes, which you encode, as well as the characters used to encode, which you also encode.

I say it “might work”, because in the wonderful world of Unicode and developing HTML standards, there might be another character to escape the encoding, or a set of multiple code points in Unicode that are treated as the encoding character or double quote by the browser.

Easier by far, to use a whitelist – only these few characters are safe,and ALL the rest get encoded.

You might have an incomplete whitelist, but that’s easily fixed later, and at its worst is no more than a slight inefficiency. If you have an incomplete blacklist, you have a security vulnerability.

Back to the story

OK, so having determined that I can’t use the script tag, maybe I can add an event handler to the tag I’m in the middle of displaying, whether it’s a link or an input. Perhaps I can get that event handler to work.

Ever faithful is the “onmouseover” event handler. So I try that.

You don’t need to see the “oops” page again. But I did.

The weirdest thing, though, is that the “onmooseover” event worked just fine.

Except I didn’t have a moose handy to demonstrate it executing.

So, that means that they had a blacklist of events, and onmouseover was on the list, but onmooseover wasn’t.

Similarly, “onfocus” triggered the “oops” page, but “onficus” didn’t. Again, sadly I didn’t have a ficus with me.

You’re just inventing event names.

Sure, but then so is the community of browser manufacturers. There’s a range of “ontouch” events that weren’t on the blacklist, but are supported by a browser or two – and then you have to wonder if Google, maker of the Chrome browser and the Glass voice-controlled eyewear, might not introduce an event or two for eyeball tracking. Maybe a Kinect-powered browser will introduce “onwaveat”. Again, the blacklist isn’t future-proof. If someone invents a new event, you have to hope you find out about it before the attackers try to use it.

Again, back to the story…

Then I tried adding characters to the beginning of the event name. Curious – that works.

And, yes, the source view showed me the event was being injected. Of course, the browser wasn’t executing it, because of course, “?onmouseover” can’t be executed. The HTML spec just doesn’t allow for it.

Eventually, I made my way through the ASCII table to the forward-slash character.

Magic!

Yes, that’s it, that executes. There’s the prompt.

Weirdly, if I used “alert” instead of “prompt”, I get the “oops” page. Clearly, “alert” is on the blacklist, “prompt” is not.

I still want to make this a ‘hotter’ report before I send it off to Starbucks, though.

How “hotter”?

Well, it’d be nice if it didn’t require the user to find and wave their mouse over the page element that you’ve found the flaw in.

Fortunately, I’d also recently found a behaviour in Internet Explorer that allows a URL to set focus to an element on the page by its ID or name. And there’s an “onfocus” event I can trigger with “/onfocus”.

So, there we are – automated execution of my chosen code.

Anything else to make it more sexy?

Sure – how about something an attacker might try – a redirect to a site of their choosing. [But since I’m not an attacker, we’ll do it to somewhere acceptable]

I tried to inject “onfocus=’document.location=”//google.com”’” – but apparently, “document” and “location” are also on the banned list.

“ownerDocu”, “ment”, “loca” and “tion” aren’t on the blacklist, so I can do “this["ownerDocu"+"ment"]["loca"+"tion"]=” …

Very quickly, this URL took the visitor away from the Starbucks search page and on to the Google page.

Now it’s ready to report.

Hard part over, right?

Well, no, not really. This took me a couple of months to get reported. I tried “security@starbucks.com”, which is the default address for reporting security issues.

The “Contact Us” link takes me to a customer service representative, and an entertaining exchange that results in them telling me that they’ve passed my email around everyone who’s interested, and the general consensus is that I should go ahead and publish my findings.

So you publish, right?

No, I’m not interested in self-publicising at the cost of someone else’s security. I do this so that things get more secure, not less.

So, I reach out to anyone I know who works for Starbucks, or has ever worked for Starbucks, and finally get to someone in the Information Security team.

This is where things get far easier – and where Starbucks does the right things.

The Information Security team works with me, politely, quickly, calmly, and addresses the problem quickly and completely. The blacklist is still there, and still takes you to the “oops” page – but it’s no longer the only protection in place.

My “onmooseover” and “onficus” events no longer work, because the correct characters are quoted and encoded.

The world is made safer and more secure, and a half a year later, I post this article, so that others can learn from this experience, too.

By withholding publishing until well after the site is fixed, I ensure that I’m not making enemies of people who might be in a position to help me later. By fixing the site quickly and quietly, Starbucks ensure that they protect their customers. And I, after all, am a customer.

The Starbucks Information Security team have also promised that there is now a route from security@ to their inbox, as well as better training for the customer service team to redirect security reports their way, rather than insisting on publishing. I think they were horrified that anyone suggested that. I know I was.

And did I ever tell you about the time I got onto Google’s hall of fame?