First, a quick recap:

A Username is one example of a Claim. So is Group Membership, Age, Eye Colour, Operating System, Installed Software, etc…

The Proof is what allows someone to reliably trust the Claim is true.

A Password is one example of a Proof. So is a Signature, a Passport, etc…

Claims are generally public, or at least non-secret, and if not unique, are at least specific (e.g. membership of the group “Brown eyes” isn’t open to people with blue eyes).

Proofs are generally secret, and may be shared, but such sharing should not be discoverable except by brute force. (Which is why we salt passwords).

Now, the topic – password resets

Password resets can occur for a number of reasons – you’ve forgotten your password, or the password change functionality is more cumbersome than the password reset, or the owner of the account has changed (is that allowable?) – but the basic principle is that an account needs a new password, and there needs to be a way to achieve that without knowledge of the existing password.

Let’s talk as if it’s a forgotten password.

So we have a Claim – we want to assert that we possess an identity – but we have to prove this without using the primary Proof.

Which means we have to know of a secondary Proof. There are common ways to do this – alternate ID, issued by someone you trust (like a government authority, etc). It’s important in the days of parody accounts, or simply shared names (is that Bob Smith, his son, Bob Smith, or his unrelated neighbour, Bob Smith?) that you have associated this alternate ID with the account using the primary Proof, or as a part of the process of setting up the account with the primary Proof. Otherwise, you’re open to account takeover by people who share the same name as their target.

And you can legally change your name.

What’s the most common alternate ID / secondary Proof?

E-mail.

Pretty much every public web site relies on the use of email for password reset, and uses that email address to provide a secondary Proof.

It’s not enough to know the email address – that’s unique and public, and so it matches the properties of a Claim, not a Proof, of identity.

We have to prove that we own the email address.

It’s not enough to send email FROM the email address – email is known to be easily forged, and so there’s no actual proof embodied in being able to send an email.

That leaves the server with the prospect of sending something TO the email address, and the recipient having proved that they received it.

You could send a code-word, and then have the recipient give you the code-word back. A shared secret, if you like.

And if you want to do that without adding another page to the already-too-large security area of the site, you look for the first place that allows you to provide your Claim and Proof, and you find the logon page.

By reusing the logon page, you’re going to say that code-word is a new password.

[This is not to say that email is the only, or even the best, way to reset passwords. In an enterprise, you have more reliable proofs of identity than an email provider outside of your control. You know people who should be able to tell you with some surety that a particular person is who they claim to be. Another common secondary identification is the use of Security Questions. See my upcoming article, “Security Questions are Bullshit” for why this is a bad idea.]

So it’s your new password, yes?

Let’s imagine what can go wrong. If I don’t know your password, but I can guess your username (because it’s not secret), I can claim to be you wanting to reset your password. That not only creates opportunity for me to fill your mailbox with code-words, but it also prevents you from logging on while the code-words are your new password. A self-inflicted denial of service.

So your old password should continue working, and if you never use the code-word, because you’re busy ignoring and deleting the emails that come in, it should keep working for you.

I’ve frequently encountered situations in my own life where I’ve forgotten my password, gone through the reset process, and it’s only while typing in the new password, and being told what restrictions there are on characters allowed in the new password, that I remember what my password was, and I go back to using that one.

In a very real sense, the code-word sent to you is NOT your new password, it’s a code-word that indicates you’ve gone the password reset route, and should be given the opportunity to set a new password.

Try not to think of it as your “temporary password”, it’s a special flag in the logon process, just like a “duress password”. It doesn’t replace your actual password.

It’s a shared secret, so keep it short-lived

Shared secrets are fantastic, useful, and often necessary – TLS uses them to encrypt data, after the initial certificate exchange.

But the trouble with shared secrets is, you can’t really trust that the other party is going to keep them secret very long. So you have to expire them pretty quickly.

The same is true of your password reset code-word.

In most cases, a user will forget their password, click the reset link, wait for an email, and then immediately follow the password reset process.

Users are slow, in computing terms, and email systems aren’t always directly linked and always-connected. But I see no reason why the most usual automated password reset process should allow the code-word to continue working after an hour.

[If the process requires a manual step, you have to count that in, especially if the manual step is something like “contact a manager for approval”, because managers aren’t generally 24/7 workers, the code-word is going to need to last much longer. But start your discussion with an hour as the base-point, and make people fight for why it’ll take longer to follow the password reset process.]

It’s not a URL

You can absolutely supply a URL in the email that will take the user to the right page to enter the code-word. But you can’t carry the code-word in the URL.

Why? Check out these presentations from this year’s Black Hat and DefCon, showing the use of a malicious WPAD server on a local – or remote – network whose purpose is to trap and save URLs, EVEN HTTPS URLs, and their query strings.

Every URL you send in an email is an HTTP or HTTPS GET, meaning all the parameters are in the URL or in the query string portion of the URL.

This means the code-word can be sniffed and usurped if it’s in the URL. And the username is already assumed to be known, since it’s a non-secret. [Just because it’s assumed to be known, don’t give the attacker an even break – your message should simply say “you requested a password reset on an account at our website” – a valid request will come from someone who knows which account at your website they chose to request.]

So, don’t put the code-word in the URL that you send in the email.

Log it, and watch for repeats, errors, other bad signs

DON’T LOG THE PASSWORD

I have to say that, because otherwise people do that, as obviously wrong as it may seem.

But log the fact that you’ve changed a password for that user, along with when you did it, and what information you have about where the user reset their password from.

Multiple users resetting their password from the same IP address – that’s a bad sign.

The same user resetting their password multiple times – that’s a bad sign.

Multiple expired code-words – that’s a bad sign.

Some of the bad things being signaled include failures in your own design – for instance, multiple expired code-words could mean that your password reset function has stopped working and needs checking. You have code to measure how many abandoned shopping carts you have, so include code that measures how many abandoned password reset attempts you have.

In summary, here’s how to do email password reset properly

A phone call from the user, to their manager, followed by the manager resetting the user’s password is trustworthy, and self-correcting. A little slow and awkward, but stronger.

Other forms of identification are almost all stronger and more resistant to forgery and attack than email.

Bear in mind that, by trusting to email, you’re trusting that someone who just forgot their password can remember their other password, and hasn’t shared it, or used an easily-guessed password.

When an account is created, associate it with an email address, and allow & encourage the validated account holder to keep that updated.

Allow anyone to begin a password reset process – but don’t allow them to repeatedly do so, for fear of allowing DoS attacks on your users.

When someone has begun a password reset process, send the user a code-word in email. This should be a random sequence, of the same sort of entropy as your password requirements.

Encoding a time-stamp in the code-word is useful.

Do not put the code-word in a URL in the message.

Do not specify the username in the message, not even in the URL.

Do not change the user’s password for them yet. This code-word is NOT their new password.

Wait for up to an hour (see item 4.1. – the time stamp) during which you’ll accept the code-word. Be very cautious about extending this period.

The code-word, when entered at the password prompt for the associated user, is an indication to change the password, and that’s ALL it can be used for.

Allow the user to cancel out of the password change, up to the point where they have entered a new password and its verification code and hit OK.

And here’s how you’re going to screw it up (don’t do these!)

You’ll use email for password resets on a high-security account. Email is relatively low-security.

You’ll use email when there’s something more reliable and/or easier to use, because “it’s what everyone does”.

You’ll use email to an external party with whom you don’t have a business relationship, as the foundation on which you build your corporate identity tower of cards.

You won’t have a trail of email addresses associated with the user, so you don’t have reason to trust you’re emailing the right user, and you won’t be able to recover when an email address is hacked/stolen.

You’ll think the code-word is a new password, leading to:

Not expiring the code-word.

Sending passwords in plain-text email (“because we already do it for password reset”).

Invalidating the user’s current password, even though they didn’t want it reset.

You’ll include the credentials (username and/or password) in a URL in the email message, so it can be picked up by malicious proxies.

You’ll fail to notice that someone’s DoSing some or all of your users by repeatedly initiating a password reset.

You’ll fail to notice that your password reset process is failing, leading to accounts being dropped as people can no longer get in.

You won’t include enough randomness in your code-word, so an attacker can reset two accounts at once – their own, and a victim’s, and use the same code-word to unlock both.

You’ll rely on security questions

You’ll use a poor-quality generation on your code-words, leading to easy predictability, reuse, etc.

You won’t think about other ways that your design will fail outside of this list here.

Let the corrections begin!

Did I miss something, or get something wrong? Let me know by posting a comment!

I chose git, in large part because it uses an open format, and as such isn’t going to suffer the same problem I had with ComponentSoftware’s CS-RCS.

Now that I’ve figured out how to use Bash on Ubuntu on Windows to convert from CS-RCS to git, using the rcs-fast-export.rb script, I’m also looking to protect my source control investment by storing it somewhere off-site.

This has a couple of good benefits – one is that I’ll have access to it when I’m away from my home machine, another is that I’ll be able to handle catastrophic outages, and a third is that I’ll be able to share more easily with co-conspirators.

I’m going to use Visual Studio Team Services (VSTS), formerly known as Visual Studio Online, previous to that, as Team Foundation Services Online. You can install VSTS on your own server, or you can use the online tool at <yourdomain>.visualstudio.com. If your team is smaller than five people, you can do this for free, just like you can use Visual Studio 2015 Community Edition for free. This is a great way in which Microsoft supports hobbyist developers, open source projects, college students, etc.

Where do we start?

You may even have done a “git checkout” command to get the source code into a place where you can work on it. That’s not necessary for our synchronisation to VSTS, because we’re going to sync the entire repository. This will work whether you are using the Bash shell or the regular Command Prompt, as long as you have git installed and in your PATH.

If you’ve actually made any changes, be sure to add and commit them to the local Git repository. We don’t want to lose those!

I’m also going to assume you have a VSTS account. First, visit the home page.

Under “Recent Projects & Teams”, click “New”.

Give it a name and a description – I suggest leaving the other settings at their default of “Agile” and “Git” unless you have reason to change. The setting of “Git” in particular is required if you’re following along, because that’s how we’re going to synchronise next.

When you click “Create project”, it’ll think for a while…

And then you’ll have the ability to continue on. Not sure my team’s actually “going to love this”, considering it’s just me!

Yes, it’s not just your eyes, the whole dialog moved down the screen, so you can’t hover over the button waiting to hit it.

Click “Navigate to project”, and you’ll discover that there’s a lot waiting for you. Fortunately a quick popup gives you the two most likely choices you’ll have for any new project.

As my team-mates will attest, I don’t do Kanban very well, so we’ll ignore that side of things, I’m mostly using this just to track my source code. So, hit “Add Code”, and you get this:

Some interesting options here

Don’t choose any yet

“Clone to your computer” – an odd choice of the direction to use, since this is an empty source directory. But, since it has a “Clone in Visual Studio” button, this may be an easy way to go if you already have a Visual Studio project working with Git that you want to tie into this. There is a problem with this, however, in that if you’re working with multiple versions of Visual Studio, note that any attempt from VSTS to open Visual Studio will only open the most recently installed version of Visual Studio. I found no way to make Visual Studio 2013 automatically open from the web for Visual Studio 2013 projects, although the Visual Studio Version Selector will make the right choice if you double click the SLN file.

“Push an existing repository from command line” – this is what I used. A simple press of the “Copy to clipboard” button gives me the right commands to feed to my command shell. You should run these commands from somewhere in your workspace, I would suggest from the root of the workspace, so you can check to see that you have a .git folder to import before you run the commands.

BUT – I would strongly recommend not dismissing this screen while you run these commands, you can’t come back to it later, and you’ll want to add a .gitignore file.

The other options are:

“Import a repository” – this is if you’re already hosting your git repository on some other web site (like Github, etc), and want to make a copy here. This isn’t a place for uploading a fast-import file, sadly, or we could shortcut the git process locally. (Hey, Microsoft, you missed a trick!)

“Initialize with a README or gitignore” – a useful couple of things to do. A README.md file is associated with git projects, and instructs newcomers to the project about it – how to build it, what it’s for, where to find documentation, etc, etc – and you can add this at any time. The .gitignore file tells git what file names and extensions to not bother with putting into. Object files, executables, temporary files, machine generated code, PCH & PDB files, etc, etc. You can see the list is long, and there’s no way to add a .gitignore file with a single button click after you’ve left this page. You can steal one from an empty project, by simply copying it – but the button press is easier.

What I’ve found

I’ve found it useful to run the “git remote” and “git push” commands from the command-line (and I choose to run them from the Bash window, because I’m already there after running the RCS export), and then add the .gitignore. So, I copy the commands and send them to the shell window, before I press the “Add a .gitignore” button, choose “Visual Studio” as my gitignore type, and then select “Initialize”:

First, let’s start with a recap of using the rcs-fast-export command to bring the code over from the old RCS to a new Git repository:

[And finally, select your .gitignore from the “Add Code” window, and hit “Initialize” – this is optional, but will prevent you from syncing object files, etc]

But that’s not quite all…

Your solution still has lines in it dictating what version control you’re using. So you want to unbind that.

[If you don’t unbind existing version control, you won’t be able to use the built-in version control features in Visual Studio, and you’ll keep getting warnings from your old version control software. When you uninstall your old version control software, Visual Studio will refuse to load your projects. So, unbinding your old version control is really important!]

I like to do that in a different directory from the original, for two reasons:

I don’t want to overwrite or delete the working workspace I’ve been working in until the new workspace works. So I still have the old directory to work from if I need to, while I’m moving to the new place.

I want to make sure that a developer (even if it’s just me six months from now, after I’ve wiped everything in a freak electromagnet accident) can connect to this version control source, and build everything.

So, now it’s Command Prompt window time…

Yes, you could do that from Visual Studio, but it’s just as easy from the command line. Note that I didn’t actually enter credentials here – they’re cached by Windows.

Your version control system may complain when opening this project that it’s not in the place it remembers being in… I know mine does. Tell it that’s OK.

[Yes, I’ve changed projects, from Juggler to EFSExt. I suddenly realised that Juggler is for Visual Studio 2010, which is old, and not installed on this system.]

Now that we’ve opened the solution in Visual Studio, it’s time to unbind the old source control. This is done by visiting the File => Source Control => Change Source Control menu option:

You’ll get a dialog that lists every project in this solution. You need to select every project that has a check-mark in the “Connected” column, and click the “Unbind” button.

Luckily, in this case, they’re already selected for me, and I just have to click “Unbind”:

You are warned:

Note that this unbinding happens in the local copy of the SLN and VCPROJ, etc files – it’s not actually going to make any changes to your version control. [But you made a backup anyway, because you’re cautious, right?]

Click “Unbind” and the dialog changes:

Click OK, and we’re nearly there…

Finally, we have to sync this up to the Git server. And to do that, we have to change the Source Control option (which was set when we first loaded the project) to Git.

Press “OK”. You’ll be warned if your solution is still bound in some part to a previous version control system. This can happen in particular if you have a project which didn’t load, but which is part of this solution. I’m not addressing here what you have to do for that, because it involves editing your project files by hand, or removing projects from the solution. You should decide for yourself which of those steps carries the least risk of losing something important. Remember that you still have your files and their history in at least THREE version control systems at this point – your old version control, the VSTS system, and the local Git repository. So even if you screw this up, there’s little real risk.

Now that you have Git selected as your solution provider, you’ll see that the “Changes” option is now available in the Team Explorer window:

Save all the files (but I don’t have any open!) by pressing Ctrl-Shift-S, or selecting File => Save All.

If you skip this step, there will be no changes to commit, and you will be confused.

Select “Changes”, and you’ll see that the SLN files and VCPROJ files have been changed. You can preview these changes, but they basically are to do with removing the old version control from the projects and solution.

It wants a commit message. This should be short and explanatory. I like “Removed references to old version control from solution”. Once you’ve entered a commit message, the Commit button is available. Click it.

It now prompts you to Sync to the server.

So click the highlighted word, “Sync”, to see all the unsynced commits – you should only have one at this point, but as you can imagine, if you make several commits before syncing, these can pile up.

Press the “Sync” button to send the commit up to the server. This is also how you should usually get changes others have made to the code on the server. Note that “others” could simply mean “you, from a different computer or repository”.

Check on the server that the history on the branch now mentions this commit, so that you know your syncing works well.

And you’re done

Sure, it seems like a long-winded process, but most of what I’ve included here is pictures of me doing stuff, and the stuff I’m doing is only done once, when you create the repository and populate it from another. Once it’s in VSTS, I recommend building your solution, to make sure it still builds. Run whatever tests you have to make sure that you didn’t break the build. Make sure that you still have valid history on all your files, especially binary files. If you don’t have valid history on any files in particular, check the original version control, to see if you ever did have. I found that my old CS-RCS implementation was storing .bmp files as text, so the current version was always fine, but the history was corrupted. That’s history I can’t retrieve, even with the new source control.

Now, what about those temporary repositories? Git makes things really easy – the Git repository is in a directory off the root of the workspace, called “.git”. It’s hidden, but if you want to delete the repository, just delete the “.git” folder and its contents. You can delete any temporary workspaces the same way, of course.

I did spend a little time automating the conversion of multiple repositories to Git, but that was rather ad-hoc and wobbly, so I’m not posting it here. I’d love to think that some of the rest of this could be automated, but I have only a few projects, so it was good to do by hand.

Final Statement

No programmer should be running an unsupported, unpatched, unupdated old version control system. That’s risky, not just from a security perspective, but from the perspective that it may screw up your files, as you vary the sort of projects you build.

No programmer should be required to drop their history when moving to a new version control system. There is always a way to move your history. Maybe that way is to hire a grunt developer to fetch versions dated at random/significant dates throughout history out of the old version control system, and check them in to the new version control system. Maybe you can write automation around that. Or maybe you’ll be lucky and find that someone else has already done the automation work for you.

Hopefully I’ve inspired you to take the plunge of moving to a new version control system, and you’ve successfully managed to bring all your precious code history with you. By using Visual Studio Team Services, you’ve also got a place to track features and bugs, and collaborate with other members of a development team, if that’s what you choose to do. Because you’ve chosen Git, you can separate the code and history at any time from the issue tracking systems, should you choose to do so.

Sometimes I think that title is the job of the Security Engineer – as a Subject Matter Expert, we’re supposed to meet with teams and tell them how their dreams are going to come crashing down around their ears because of something they hadn’t thought of, but which is obvious to us.

This can make us just a little bit unpopular.

But being argumentative and sceptical isn’t entirely a bad trait to have.

Sometimes it comes in handy when other security guys spread their various statements of doom and gloom – or joy and excitement.

Examples in a single line

“Rename your administrator account so it’s more secure” – or lengthen the password and achieve the exact same effect without breaking scripts or requiring extra documentation so people know what the administrator is called this week.

“Encrypt your data at rest by using automatic database encryption” – which means any app that authenticates to the database can read that data back out, voiding the protection that was the point of encrypting at rest. If fields need encrypting, maybe they need field-level access control, too.

“Complex passwords, one lower case, one upper case, one number, one symbol, no repeated letters” – or else, measure strength in more interesting ways, and display to users how strong their password is, so that a longish phrase, used by a competent typist, becomes an acceptable password.

Today’s big example – password expiration

Now I’m going to commit absolute heresy, as I’m going against the biggest recent shock news in security advice.

I understand the arguments, and I know I’m frequently irritated with the unnecessary requirements to change my password after sixty days, and even more so, I know that the reasons behind password expiration settings are entirely arbitrary.

But…

There’s a good side to password expiry.

These aren’t the only ways in which passwords are discovered.

The method that frequently gets overlooked is when they are deliberately shared.

Sharing means scaring

“Bob’s out this week, because his mother died, and he has to arrange details in another state. He didn’t have time to set up access control changes before he left, but he gave me a sticky-note with his password on it, so that we don’t need to bother him for anything”

“Everyone on this team has to monitor and interact with so many shared service accounts, we just print off a list of all the service account passwords. You can photocopy my laminated card with the list, if you like.”

Yes, those are real situations I’ve dealt with, and they have some pretty obvious replacement solutions:

Bob

Bob (or Bob’s manager, if Bob is too distraught to talk to anyone, which isn’t at all surprising) should notify a system administrator who can then respond to requests to open up ACLs as needed, rather than someone using Bob’s password. But he didn’t.

When Bob comes back, is he going to change his password?

No, because he trusts his assistant, Dave, with his communications.

But, of course, Dave handed out his password to the sales VP, because it was easier for him than fetching up the document she wanted. And sales VPs just can’t be trusted. Now the entire sales team knows Bob’s password. And then one of them gets fired, or hired on at a new competitor. The temptation to log on to Bob’s account – just once – is immense, because that list of customers is just so enticing. And really, who would ever know? And if they did know, everyone has Bob’s password, so it’s not like they could prosecute you, because they couldn’t prove it was you.

What’s going to save Bob is if he is required to change his password when he returns.

Laminated Password List

Yes, this also happened. Because we found the photocopy of the laminated sheet folded up on the floor of a hallway outside the lavatory door.

There was some disciplining involved. Up to, and possibly including, the termination of employment, as policy allows.

Then the bad stuff happened.

The team who shared all these passwords pointed out that, as well as these being admin-level accounts, they had other special privileges, including the avoidance of any requirement to change passwords.

These passwords hadn’t changed in six years.

And the team had no idea what would break if they changed the passwords.

Maybe one of those passwords is hard-coded into a script somewhere, and vital business processes would grind to a halt if the password was changed.

When I left six months later, they were still (successfully) arguing that it would be too dangerous to try changing the passwords.

To change a behaviour, you must first accept it

I’m not familiar with any company that acknowledges in policy that users share passwords, nor the expected behaviour when they do [log when you shared it, and who you shared it with, then change it as soon as possible after it no longer needs to be shared].

Once you accept that passwords are shared for valid reasons, even if you don’t enumerate what those reasons are, you can come up with processes and tools to make that sharing more secure.

If there was a process for Bob to share with Dave what his password is, maybe outlining the creation of a temporary password, reading Dave in on when Dave can share the password (probably never), and how Dave is expected to behave, and becomes co-responsible for any bad things done in Bob’s account, suddenly there’s a better chance Dave’s not going to share. “I can’t give you Bob’s password, but I can get you that document you’re after”

If there was a tool in which the team managing shared service accounts could find and unlock access to passwords, that tool could also be configured to distribute changed passwords to the affected systems after work had been performed.

Without acceptance, it is expiry which saves you

If you don’t have these processes or tools, the only protection you have against password sharing (apart from the obviously failing advice to “just don’t do it”) is regular password expiry.

Password expiry as a Business Continuity Practice

I’m also fond of talking about password expiration as being a means to train your users.

Do you even know how to change your passwords?

Certificates expire once a year, and as a result, programmers write code as if it’s never going to happen. After all, there’s plenty of time between now and next year to write the “renew certificate” logic, and by the time it’s needed, I’ll be working on another project anyway.

If passwords don’t expire – or don’t expire often enough – users will not have changed their password anything like recently enough to remember how to do so if they have to in an emergency.

So, when a keylogger has been discovered to be caching all the logons in the conference room, or the password hashes have been posted on Pastebin, most of your users – even the ones approving the company-wide email request for action – will fight against a password change request, because they just don’t know how to do it, or what might break when they do.

Unless they’ve been through it before, and it were no great thing. In which case, they’ll briefly sigh, and then change their password.

So, password expiry – universally good?

This is where I equivocate and say, yes, I broadly think the advice to reduce or remove password expiration is appropriate. My arguments above are mainly about things we’ve forgotten to remember that are reasons why we might have included password expiry to begin with.

Here, in closing, are some ways in which password expiry is bad, just to balance things out:

Some people guard their passwords closely, and generate secure passwords, or use hardware token devices to store passwords. These people should be rewarded for their security skills, not punished under the same password expiry regimen as the guy who adds 1 to his password, and is now up to “p@55w0rd73”.

Expiring passwords too frequently drives inexorably to “p@55word73”, and to password reuse across sites, as it takes effort to think up complex passwords, and you’re not going to waste that effort thinking up unique words every couple of months.

Password expiration often means that users spend significant time interrupted in their normal process, by the requirement to think up a password and commit it to the multiple systems they use.

Where do we stand?

Right now, this is where we stand:

Many big security organisations – NIST, CESG, etc – talk about how password expiry is an outdated and unnecessary concept

I think they’re missing some good reasons why password expiry is in place (and some ways in which that could be fixed)

Existing security standards that businesses are expected to comply with have NOT changed their stance.

As a result, especially of this last item, I don’t think businesses can currently afford to remove password expiry from their accounts.

But any fool can see which way the wind is blowing – at some point, you will be able to excuse your company from password expiry, but just in case your compliance standard requires it, you should have a very clear and strong story about how you have addressed the risks that were previously resolved by expiring passwords as frequently as once a quarter.

So why’s it still there?

One simple reason – if I move off the platform, I face the usual choice when migrating from one version control system to another:

Carry all my history, so that I can review earlier versions of the code (for instance, when someone says they’ve got a new bug that never happened in the old version, or when I find a reversion, or when there’s a fix needed in one area of the code tree that I know I already made in a different area and just need to copy)

Lose all the history by starting fresh with the working copy of the source code

The second option seems a bit of a waste to me.

OK, so yes, technically I could mix the two modes, by using CS-RCS Pro to browse the ancient history when I need to, and Git to browse recent history, after starting Git from a clean working folder. But I could see a couple of problems:

Of course the bug I’m looking through history for is going to be across the two source control packages

It would mean I still have CS-RCS Pro sitting around installed, unpatched and likely vulnerable, on one of my dev systems

So, really, I wanted to make sure that I could move my files, history and all.

What stopped you?

I really didn’t have a good way to do it.

Clearly, any version control system can be moved to any other version control system by the simple expedient of:

For each change X:

Set the system date to X’s date

Fetch the old source control’s files from X into the workspace

Commit changes to the new source control, with any comments from X

Next change

But, as you can imagine, that’s really long-winded and manual. That should be automatable.

In fact, given the shared APIs of VSS-compatible source control services, I’m truly surprised that nobody has yet written a tool to do basically this task. I’d get on it myself, but I have other things to do. Maybe someone will write a “VSS2Git” or “VSS2VSS” toolkit to do just this.

There is a format for creating a single-file copy of a Git repository, which Git can process using the command “git fast-import”. So all I have to find is a tool that goes from a CS-RCS repository to the fast-import file format.

Nobody uses CS-RCS Pro

So, clearly there’s no tool to go from CS-RCS Pro to Git. There’s a tool to go from CS-RCS Pro to CVS, or there was, but that was on the now-defunct CS-RCS web site.

But… Remember I said that it’s compatible with GNU RCS.

And there’s scripts to go from GNU RCS to Git.

What you waiting for? Do it!

OK, so the script for this is written in Ruby, and as I read it, there seemed to be a few things that made it look like it might be for Linux only.

I really wasn’t interested in making a Linux VM (easy though that may be) just so I could convert my data.

So why are you writing this?

Everything changed with the arrival of the recent Windows 10 Anniversary Update, because along with it came a new component.

Bash on Ubuntu on Windows.

It’s like a Linux VM, without needing a VM, without having to install Linux, and it works really well.

With this, I could get all the tools I needed – GNU RCS, in case I needed it; Ruby; Git command line – and then I could try this out for myself.

Of course, I wouldn’t be publishing this if it wasn’t somewhat successful. But there are some caveats, OK?

Here’s the caveats

I’ve tried this a few times, on ONE of my own projects. This isn’t robustly tested, so if something goes all wrong, please by all means share, and people who are interested (maybe me) will probably offer suggestions, some of them useful. I’m not remotely warrantying this or suggesting it’s perfect. It may wipe your development history out of your one and only copy of version control… so don’t do it on your one and only copy. Make a backup first.

GNU RCS likes to store files in one of two places – either in the same directory as the working files, but with a “,v” pseudo-extension added to the filename, or in a sub-directory off each working folder, called “RCS” and with the same “,v” extension on the files. If you did either of these things, there’s no surprises. But…

CS-RCS Pro doesn’t do this. It has a separate RCS Repository Root. I put mine in C:\RCS, but you may have yours somewhere else. Underneath that RCS Repository Root is a full tree of the drives you’ve used CS-RCS to store (without the “:”), and a tree under that. I really hope you didn’t embed anything too deep, because that might bode ill.

Initially, this seemed like a bad thing, but because you don’t actually need the working files for this task, you can pretend that the RCS Repository is actually your working space.

Maybe this is obvious, but it took me a moment of thinking to decide I didn’t have to move files into RCS sub-folders of my working directories.

Make this a “flag day”. After you do this conversion, never use CS-RCS Pro again. It was good, and it did the job, and it’s now buried in the garden next to Old Yeller. Do not sprinkle the zombification water on that hallowed ground to revive it.

This also means you MUST check in all your code before converting, because checking it in afterwards will be … difficult.

Enough already, how do we do this?

Assumption: You have Windows 10.

Install Windows 10 Anniversary Update – this is really easy, it’s an update, you’ve probably been offered it already, and you may even have installed it. This is how you’ll know you have it:

Install Bash on Ubuntu on Windows – everyone else has written an article on how to do this, so here’s a link (I was going to link to the PC World article, but the full-page ad that popped up and obscured the screen, without letting me click the “no thanks” button persuaded me otherwise).

Git uses email addresses, rather than owner names, and it insists on them having angle brackets. If your username in CS-RCS Pro was “bob”, and your email address is “kate@example.com”, create an authors file with a bash command like this:
echo “bob=Kate Smith <kate@example.com>” > AuthorsFile

Now do the actual creation of the file to be imported, with this bash command:
./rcs-fast-export.rb -A AuthorsFile /mnt/c/RCS/…path-to-project… > project-name.gitexport
[Note a couple of things here – starting with “./”, because that isn’t automatically in the PATH in Linux. Your Windows files are logically mounted in drives under /mnt, so C:\RCS is in /mnt/c/RCS. Case is important. Your “…path-to-project…” probably starts with “c/”, so that’s going to look like “/mnt/c/RCS/c/…” which might look awkward, but is correct. Use TAB-completion on folder names to help you.]

Read the errors and correct any interesting ones.

Now import the file into Git. We’re going to initialise a Git repository in the “.git” folder under the current folder, import the file, reset the head, and finally checkout all the files into the “master” branch under the current directory “.”. These are the bash commands to do this:
git init
git fast-import < project-name.gitexport
git reset
git checkout master .

Profit!

If you’re using Visual Studio and want to connect to this Git repository, remember that your Linux home directory sits under “%userprofile%\appdata\local\lxss\home”

This might look like a lot of instructions, but I mostly just wanted to be clear. This is really quick work. If you screw up after the “git init” command, simply “rm –rf .git” to remove the new repository.

Despite having reasonably strong background in the use of crypto, and a little dabbling into the analysis of crypto, I don’t really follow the whole “blockchain” thing.

So, here’s my attempt to explain what little I understand of blockchains and their potential uses, with an open invitation to come and correct me.

What’s a blockchain used for?

The most widely-known use of blockchains is that of Bit Coin and other “digital currencies”.

Bit Coins are essentially numbers with special properties, that make them progressively harder to find as time goes on. Because they are scarce and getting scarcer, it becomes possible for people of a certain mindset to ascribe a “value” to them, much as we assign value to precious metals or gemstones aside from their mere attractiveness. [Bit Coins have no intrinsic attractiveness as far as I can tell] That there is no actual intrinsic value leads me to refer to Bit Coin as a kind of shared madness, in which everyone who believes there is value to the Bit Coin shares this delusion with many others, and can use that shared delusion as a basis for trading other valued objects. Of course, the same kind of shared madness is what makes regular financial markets and country-run money work, too.

Because of this value, people will trade them for other things of value, whether that’s shiny rocks, or other forms of currency, digital or otherwise. It’s a great way to turn traceable goods into far less-traceable digital commodities, so its use for money laundering is obvious. Its use for online transactions should also be obvious, as it’s an irrevocable and verifiable transfer of value, unlike a credit card which many vendors will tell you from their own experiences can be stolen, and transactions can be revoked as a result, whether or not you’ve shipped valuable goods.

What makes this an irrevocable and verifiable transfer is the principle of a “blockchain”, which is reported in a distributed ledger. Anyone can, at any time, at least in theory, download the entire history of ownership of a particular Bit Coin, and verify that the person who’s selling you theirs is truly the current and correct owner of it.

How does a blockchain work?

I’m going to assume you understand how digital signatures work at this point, because that’s a whole ‘nother explanation.

Remember that a Bit Coin starts as a number. It could be any kind of data, because all data can be represented as a number. That’s important, later.

The first owner of that number signs it, and then distributes the number and signature out to the world. This is the “distributed ledger”. For Bit Coins, the “world” in this case is everyone else who signs up to the Bit Coin madness.

When someone wants to buy that Bit Coin (presumably another item of mutually agreed similar value exchanges hands, to buy the Bit Coin), the seller signs the buyer’s signature of the Bit Coin, acknowledging transfer of ownership, and then the buyer distributes that signature out to the distributed ledger. You can now use the distributed ledger at any time to verify that the Bit Coin has a story from creation and initial signature, unbroken, all the way up to current ownership.

I’m a little flakey on what, other than a search in the distributed ledger for previous sales of this Bit Coin, prevents a seller from signing the same Bit Coin over simultaneously to two other buyers. Maybe that’s enough – after all, if the distributed ledger contains a demonstration that you were unreliable once, your other signed Bit Coins will presumably have zero value.

So, in this perspective, a blockchain is simply an unbroken record of ownership or provenance of a piece of data from creation to current owner, and one that can be extended onwards.

In the world of financial use, of course, there are some disadvantages – the most obvious being that if I can make you sign a Bit Coin against your well, it’s irrevocably mine. There is no overarching authority that can say “no, let’s back up on that transaction, and say it never happened”. This is also pitched as an advantage, although many Bit Coin owners have been quite upset to find that their hugely-valuable piles of Bit Coins are now in someone else’s ownership.

Now, explain your tweet.

With the above perspective in the back of my head, I read the Project Bletchley report.

I even looked at the pictures.

I still didn’t really understand it, but something went “ping” in my head.

Maybe this is how C-level executives feel.

With Microsoft releasing "blockchain as a service", how long till privacy rules suggest using blockchains to track data provenance?

Maybe in environments where privacy is respected, like the EU, blockchains could be an avenue by which regulators enforce companies describing and PROVING where their data comes from, and that it was not acquired or used in an inappropriate manner?

When I give you my data, I sign it as coming from me, and sign that it’s now legitimately possessed by you (I won’t say “owned”, because I feel that personal data is irrevocably “owned” by the person it describes). Unlike Bit Coin, I can do this several times with the same packet of data, or different packets of data containing various other information. That information might also contain details of what I’m approving you to do with that information.

This is the start of a blockchain.

When information is transferred to a new party, that transfer will be signed, and the blockchain can be verified at that point. Further usage restrictions can be added.

Finally, when an information commissioner wants to check whether a company is handling data appropriately, they can ask for the blockchains associated with data that has been used in various ways. That then allows the commissioner to verify whether reported use or abuse has been legitimately approved or not.

And before this sounds like too much regulatory intervention, it also allows businesses to verify the provenance of the data they have, and to determine where sensitive data resides in their systems, because if it always travels with its blockchain, it’s always possible to find and trace it.

[Of course, if it travels without its blockchain, then it just looks like you either have old outdated software which doesn’t understand the blockchain and needs to be retired, or you’re doing something underhanded and inappropriate with customers’ data.]

It even allows the revocation of a set of data to be performed – when a customer moves to another provider, for instance.

The Atlantic today published a reminder that the Associated Press has declared in their style guide as of today that the word “Internet” will be spelt with a lowercase ‘i’ rather than an uppercase ‘I’.

The title is “Elegy for the Capital-I Internet”, but manages to be neither elegy nor eulogy, and misses the mark entirely, focusing as it does on the awe-inspiring size of the Internet being why the upper-case initial was important; then moving to describe how its sheer ubiquity should lead us to associating it with a lower-case i.

Here’s my take

The "Internet", capital I, gives the information that this is the only one of its kind, anywhere, ever. There is only one Internet. A lower-case I would indicate that there are several "internets". And, sure enough, there are several lower-class networks-of-networks (which is the definition of “internet” as a lower-case noun).

I’d like to inform the people who are engaging in this navel-gazing debate over big-I or small-i, that there functionally is only exactly one Internet. When their cable company came to "install the Internet", there was no question on the form to say "which internet do you want to connect to?" and people would have been rightly upset if there had been.

So, from that perspective, very much capital-I is still the right term for the Internet. There’s only one. Those other smaller internets are not comparable to “the Internet”.

From a technical perspective, we’re actually at the time when it’s closest to being true that there’s two internets. We’re in the midst of the long, long switch from IPv4 to IPv6. We’ve never done that before. And, while there are components of each that will talk to the other, it’s possible to describe the IPv6 and IPv4 collections of networks as two different "internets". So, maybe small-i is appropriate, but for none of the reasons this article describes.

Having said that, IPv6 engineers work really really hard to make sure that users just plain don’t notice that there’s a second internet while they’re building it, and it just feels exactly like it would if there was still only one Internet.

Again, you come back to "there is only one Internet", you don’t get to check a box that selects which of several internets you are going to connect to, it’s not like "the cloud", where there are multiple options. You are either connected to the one Internet, or you’re not connected to any internet at all.

The winner is…

Capital I, and bollocks to the argument from the associated press – lower-cased, because it’s not really that big or important, and neither is the atlantic. So, with their own arguments (which I believe are fallacious anyway), I don’t see why they deserve an upper-case initial.

The Atlantic, on the other hand – that’s huge and I wouldn’t want to cross it under my own steam.

And the Internet, different from many other internets, deserves its capital I as a designation of its singular nature. Because it’s a proper noun.

a group of three colleagues each pointing at the other two to tell you who to blame

three guys tied to a pole desperately trying to escape the rising orange flood waters

If you work at the first place, reach out to me on LinkedIn – I know some people who might want to work with you.

If you’re at the third place, you should probably get out now. Whatever they’re paying you, or however much the stock might be worth come the IPO, it’s not worth the pain and suffering.

If you’re at the second place, congratulations – you’re at a regular, ordinary workplace that could do with a little better management.

What’s this to do with security?

A surprisingly great deal.

Whenever there’s a security incident, there should be an investigation as to its cause.

Clearly the cause is always human error. Machines don’t make mistakes, they act in predictable ways – even when they are acting randomly, they can be stochastically modeled, and errors taken into consideration. Your computer behaves like a predictable machine, but at various levels it actually routinely behaves like it’s rolling dice, and there are mechanisms in place to bias those random results towards the predictable answers you expect from it.

Humans, not so much.

Humans make all the mistakes. They choose to continue using parts that are likely to break, because they are past their supported lifecycle; they choose to implement only part of a security mechanism; they forget to finish implementing functionality; they fail to understand the problem at hand; etc, etc.

It always comes back to human error.

Or so you think

Occasionally I will experience these great flashes of inspiration from observing behaviour, and these flashes dramatically affect my way of doing things.

One such was when I attended the weekly incident review board meetings at my employer of the time – a health insurance company.

Once each incident had been resolved and addressed, they were submitted to the incident review board for discussion, so that the company could learn from the cause of the problem, and make sure similar problems were forestalled in future.

These weren’t just security incidents, they could be system outages, problems with power supplies, really anything that wasn’t quickly fixed as part of normal process.

But the principles I learned there apply just as well to security incident.

Root cause analysis

The biggest principle I learned was “root cause analysis” – that you look beyond the immediate cause of a problem to find what actually caused it in the long view.

At other companies, who can’t bear to think that they didn’t invent absolutely everything, this is termed differently, for instance, “the five whys” (suggesting if you ask “why did that happen?” five times, you’ll get to the root cause). Other names are possible, but the majority of the English-speaking world knows it as ‘root cause analysis’

This is where I learned that if you believe the answer is that a single human’s error caused the problem, you don’t have the root cause.

But!

Whenever I discuss this with friends, they always say “But! What about this example, or that?”

You should always ask those questions.

Here’s some possible individual causes, and some of their associated actual causes:

Bob pulled the wrong lever

Who trained Bob about the levers to pull? Was there documentation? Were the levers labeled? Did anyone assess Bob’s ability to identify the right lever to pull by testing him with scenarios?

Kate was evil and did a bad thing

Why was Kate allowed to have unsupervised access? Where was the monitoring? Did we hire Kate? Why didn’t the background check identify the evil?

Jeremy told everyone the wrong information

Was Jeremy given the right information? Why was Jeremy able to interpret the information from right to wrong? Should this information have been automatically communicated without going through a Jeremy? Was Jeremy trained in how to transmute information? Why did nobody receiving the information verify it?

Grace left her laptop in a taxi

Why does Grace have data that we care about losing – on her laptop? Can we disable the laptop remotely? Why does she even have a laptop? What is our general solution for people, who will be people, leaving laptops in a taxi?

Jane wrote the algorithm with a bug in it

Who reviews Jane’s code? Who tests the code? Is the test automated? Was Jane given adequate training and resources to write the algorithm in the first place? Is this her first time writing an algorithm – did she need help? Who hired Jane for that position – what process did they follow?

I could go on and on, and I usually do, but it’s important to remember that if you ever find yourself blaming an individual and saying “human error caused this fault”, it’s important to remember that humans, just like machines, are random and only stochastically predictable, and if you want to get predictable results, you have to have a framework that brings that randomness and unpredictability into some form of logical operation.

Many of the questions I asked above are also going to end up with the blame apparently being assigned to an individual – that’s just a sign that it needs to keep going until you find an organisational fix. Because if all you do is fix individuals, and you hire new individuals and lose old individuals, your organisation itself will never improve.

[Yes, for the pedants, your organisation is made up of individuals, and any organisational fix is embodied in those individuals – so blog about how the organisation can train individuals to make sure that organisational learning is passed on.]

Finally, if you’d like to not use Ubuntu as my “circle of blame” logo, there’s plenty of others out there – for instance, Microsoft Alumni:

Kind of confusing and scary if you’re not quite sure what this all means – perhaps clear and scary if you do.

BlueCoat manufactures “man in the middle” devices – sometimes used by enterprises to scan and inspect / block outbound traffic across their network, and apparently also used by governments to scan and inspect traffic across the network.

The first use is somewhat acceptable (enterprises can prevent their users from distributing viruses or engaging in illicit behaviour from work computers, which the enterprises quite rightly believe they own and should control), but the second use is generally not acceptable, depending on how much you trust your local government.

Filippo helpfully gives instructions on blocking this from OSX, and a few people in the Twitter conversation have asked how to do this on Windows.

Disclaimer!

Don’t do this on a machine you don’t own or manage – you may very well be interfering with legitimate interference in your network traffic. If you’re at work, your employer owns your computer, and may intercept, read and modify your network traffic, subject to local laws, because it’s their network and their computer. If your government has ruled that they have the same rights to intercept Internet traffic throughout your country, you may want to consider whether your government shouldn’t be busy doing other things like picking up litter and contributing to world peace.

The simple Windows way

As with most things on Windows, there’s multiple ways to do this. Here’s one, which can be followed either by regular users or administrators. It’s several steps, but it’s a logical progression, and will work for everyone.

Step 1. Download the certificate. Really, literally, follow the link to the certificate and click “Open”. It’ll pop up as follows:

Step 2. Install the certificate. Really, literally, click the button that says “Install Certificate…”. You’ll see this prompt asking you where to save it:

Step 3. If you’re a non-administrator, and just want to untrust this certificate for yourself, leave the Store Location set to “Current User”. If you want to set this for the machine as a whole, and you’re an administrator, select Local Machine, like this:

Step 4: Click Next, to be asked where you’re putting the certificate:

Step 5: Select “Place all certificates in the following store”:

Step 6: Click the “Browse…” button to be given choices of where to place this certificate:

Step 7: Don’t select “Personal”, because that will explicitly trust the certificate. Scroll down and you’ll see “Untrusted Certificates”. Select that and hit OK:

Step 8: You’re shown the store you plan to install into:

Step 9: Click “Next” – and you’ll get a final confirmation option. Read the screen and make sure you really want to do what’s being offered – it’s reversible, but check that you didn’t accidentally install the certificate somewhere wrong. The only place this certificate should go to become untrusted is in the Untrusted Certificates store:

Step 10: Once you’re sure you have it right, click “Finish”. You’ll be congratulated with this prompt:

Step 11: Verification. Hit OK on the “import was successful” box. If you still have the Certificate open, close it. Now reopen it, from the link or from the certificate store, or if you downloaded the certificate, from there. It’ll look like this:

The certificate hasn’t actually been revoked, and you can open up the Untrusted Certificates store to remove this certificate so it’s trusted again if you find any difficulties.

Other methods

There are other methods to do this – if you’re a regular admin user on Windows, I’ll tell you the quicker way is to open MMC.EXE, add the Certificates Snap-in, select to manage either the Local Computer or Current User, navigate to the Untrusted Certificates store and Import the certificate there. For wide scale deployment, there are group policy ways to do this, too.

Take a long hard look at what you’re doing, and whether you actually need to be in charge of that kind of risk.

Do you need those passwords to verify a user or to impersonate them?

If you are going to verify a user, you don’t need encrypted passwords, you need hashed passwords. And those hashes must be salted. And the salt must be large and random. I’ll explain why some other time, but you should be able to find much documentation on this topic on the Internet. Specifically, you don’t need to be able to decrypt the password from storage, you need to be able to recognise it when you are given it again. Better still, use an acknowledged good password hashing mechanism like PBKDF2. (Note, from the “2” that it may be necessary to update this if my advice is more than a few months old)

Now, do not read the rest of this section – skip to the next question.

Seriously, what are you doing reading this bit? Go to the heading with the next question. You don’t need to read the next bit.

<sigh/>

OK, if you are determined that you will have to impersonate a user (or a service account), you might actually need to store the password in a decryptable form.

First make sure you absolutely need to do this, because there are many other ways to impersonate an incoming user using delegation, etc, which don’t require you storing the password.

Explore delegation first.

Finally, if you really have to store the password in an encrypted form, you have to do it incredibly securely. Make sure the key is stored separately from the encrypted passwords, and don’t let your encryption be brute-forcible. A BAD way to encrypt would be to simply encrypt the password using your public key – sure, this means only you can decrypt it, but it means anyone can brute-force an encryption and compare it against the ciphertext.

A GOOD way to encrypt the password is to add some entropy and padding to it (so I can’t tell how long the password was, and I can’t tell if two users have the same password), and then encrypt it.

Password storage mechanisms such as keychains or password vaults will do this for you.

If you don’t have keychains or password vaults, you can encrypt using a function like Windows’ CryptProtectData, or its .NET equivalent, System.Security.Cryptography.ProtectedData.

[Caveat: CryptProtectData and ProtectedData use DPAPI, which requires careful management if you want it to work across multiple hosts. Read the API and test before deploying.]

[Keychains and password vaults often have the same sort of issue with moving the encrypted password from one machine to another.]

Can you choose how strong those passwords must be?

If you’re protecting data in a business, you can probably tell users how strong their passwords must be. Look for measures that correlate strongly with entropy – how long is the password, does it use characters from a wide range (or is it just the letter ‘a’ repeated over and over?), is it similar to any of the most common passwords, does it contain information that is obvious, such as the user’s ID, or the name of this site?

Maybe you can reward customers for longer passwords – even something as simple as a “strong account award” sticker on their profile page can induce good behaviour.

Length is mathematically more important to password entropy than the range of characters. An eight character password chosen from 64 characters (less than three hundred trillion combinations – a number with 4 commas) is weaker than a 64 character password chosen from eight characters (a number of combinations with 19 commas in it).

An 8-character password taken from 64 possible characters is actually as strong as a password only twice as long and chosen from 8 characters – this means something like a complex password at 8 characters in length is as strong as the names of the notes in a couple of bars of your favourite tune.

Allowing users to use password safes of their own makes it easier for them to use longer and more complex passwords. This means allowing copy and paste into password fields, and where possible, integrating with any OS-standard password management schemes

What happens when a user forgets their password?

Everything seems to default to sending a password reset email. This means your users’ email address is equivalent to their credential. Is that strength of association truly warranted?

In the process to change my email address, you should ask me for my password first, or similarly strongly identify me.

What happens when I stop paying my ISP, and they give my email address to a new user? Will they have my account on your site now, too?

Every so often, maybe you should renew the relationship between account and email address – baselining – to ensure that the address still exists and still belongs to the right user.

Do you allow password hints or secret questions?

Password hints push you dangerously into the realm of actually storing passwords. Those password hints must be encrypted as well as if they were the password themselves. This is because people use hints such as “The password is ‘Oompaloompah’” – so, if storing password hints, you must encrypt them as strongly as if you were encrypting the password itself. Because, much of the time, you are. And see the previous rule, which says you want to avoid doing that if at all possible.

This is not a new thing – and that’s bad

This was reported by SoftPedia as a “new attack”, but it’s really an old attack. This is just another way to execute DOM-based XSS.

That means that web sites are being attacked by old bugs, not because their own coding is bad, but because they choose to make money from advertising.

And because the advertising industry is just waaaay behind on securing their code, despite being effectively a widely-used framework across the web.

You’ve seen previously on my blog how I attacked Troy Hunt’s blog through his advertising provider, and he’s not the first, or by any means the last, “victim” of my occasional searches for flaws.

It’s often difficult to trace which ad provider is responsible for a piece of vulnerable code, and the hosting site may not realise the nature of their relationship and its impact on security. As a security researcher, it’s difficult to get traction on getting these vulnerabilities fixed.

Important note

I’m trying to get one ad provider right now to fix their code. I reported a bug to them, they pointed out it was similar to the work Randy Westergren had written up.

So they are aware of the problem.

It’s over a month later, and the sites I pointed out to them as proofs of concept are still vulnerable.

Partly, this is because I couldn’t get a reliable repro as different ad providers loaded up, but it’s been two weeks since I sent them a reliable repro – which is still working.

Reported a month ago, reliable repro two weeks ago, and still vulnerable everywhere.

[If you’re defending a site and want to figure out which ad provider is at fault, inject a “debugger” statement into the payload, to have the debugger break at the line that’s causing a problem. You may need to do this by replacing “prompt()” or “alert()” with “(function(){debugger})()” – note that it’ll only break into the debugger if you have the debugger open at the time.]

How the “#” affects the URL as a whole

Randy’s attack example uses a symbol you won’t see at all in some web sites, but which you can’t get away from in others. The “#” or “hash” symbol, also known as “number” or “hash”. [Don’t call it “pound”, please, that’s a different symbol altogether, “£”] Here’s his example:

http://nypost.com/#1'-alert(1)-'"-alert(1)-"

Different parts of the URL have different names. The “http:” part is the “protocol”, which tells the browser how to connect and what commands will likely work. “//nypost.com/” is the host part, and tells the browser where to connect to. Sometimes a port number is used – commonly, 80 or 443 – after the host name but before the terminating “/” of the host element. Anything after the host part, and before a question-mark or hash sign, is the “path” – in Randy’s example, the path is left out, indicating he wants the root page. An optional “query” part follows the path, indicated by a question mark at its start, often taking up the rest of the URL. Finally, if a “#” character is encountered, this starts the “anchor” part, which is everything from after the “#” character on to the end of the URL.

The “anchor” has a couple of purposes, one by design, and one by evolution. The designed use is to tell the browser where to place the cursor – where to scroll to. I find this really handy if I want to draw someone’s attention to a particular place in an article, rather than have them read the whole story. [It can also be used to trigger an onfocus event handler in some browsers]

The second use is for communication between components on the page, or even on other pages loaded in frames.

The anchor tag is for the browser only

I want to emphasise this – and while Randy also mentioned it, I think many web site developers need to understand this when dealing with security.

The anchor tag is not sent to the server.

The anchor tag does not appear in your server’s logs.

WAFs cannot filter the anchor tag.

If your site is being attacked through abuse of the anchor tag, you not only can’t detect it ahead of time, you can’t do basic forensic work to find out useful things such as “when did the attack start”, “what sort of things was the attacker doing”, “how many attacks happened”, etc.

[Caveat: pedants will note that when browser code acts on the contents of the anchor tag, some of that action will go back to the server. That’s not the same as finding the bare URL in your log files.]

If you have an XSS that can be triggered by code in an anchor tag, it is a “DOM-based XSS” flaw. This means that the exploit happens primarily (or only) in the user’s browser, and no filtering on the server side, or in the WAF (a traditional, but often unreliable, measure against XSS attacks), will protect you.

When trying out XSS attacks to find and fix them, you should try attacks in the anchor tag, in the query string, and in the path elements of the URL if at all possible, because they each will get parsed in different ways, and will demonstrate different bugs.

What does “-alert(1)-“ even mean?

The construction Randy uses may seem a little odd:

"-alert(1)-"'-alert(1)-'

With some experience, you can look at this and note that it’s an attempt to inject JavaScript, not HTML, into a quoted string whose injection point doesn’t properly (or at all) escape quotes. The two different quote styles will escape from quoted strings inside double quotes and single quotes alike (I like to put the number ‘2’ in the alert that is escaped by the double quotes, so I know which quote is escaped).

But why use a minus sign?

Surely it’s invalid syntax?

While JavaScript knows that “string minus void” isn’t a valid operation, in order to discover the types of the two arguments to the “minus” operator, it actually has to evaluate them. This is a usual side-effect of a dynamic language – in order to determine whether an operation is valid, its arguments have to be evaluated. Compiled languages are usually able to identify specific types at compile time, and tell you when you have an invalid operand.

So, now that we know you can use any operator in there – minus, times, plus, divide, and, or, etc – why choose the minus? Here’s my reasoning: a plus sign in a URL is converted to a space. A divide (“/”) is often a path component, and like multiplication (“*”) is part of a comment sequence in JavaScript, “//” or “/*”, an “&” is often used to separate arguments in a query string, and a “|” for “or” is possibly going to trigger different flaws such as command injection, and so is best saved for later.

Also, the minus sign is an unshifted character and quick to type.

There are so many other ways to exploit this – finishing the alert with a line-ending comment (“//” or “<–”), using “prompt” or “confirm” instead of “alert”, using JavaScript obfuscaters, etc, but this is a really good easy injection point.

Another JavaScript syntax abuse is simply to drop “</script>” in the middle of the JavaScript block and then start a new script block, or even just regular HTML. Remember that the HTML parser only hands off to the JavaScript parser once it has found a block between “<script …>” and “</script …>” tags. It doesn’t matter if the closing tag is “within” a JavaScript string, because the HTML parser doesn’t know JavaScript.

There’s no single ad provider, and they’re almost all vulnerable

Part of the challenge in repeating these attacks, demonstrating them to others, etc, is that there’s no single ad provider, even on an individual web site.

Two visits to the same web site not only bring back different adverts, but they come through different pieces of code, injected in different ways.

If you don’t capture your successful attack, it may not be possible to reproduce it.

Similarly, if you don’t capture a malicious advert, it may not be possible to prove who provided it to you. I ran into this today with a “fake BSOD” malvert, which pretended to be describing a system error, and filled as much of my screen as it could with a large “alert” dialog, which kept returning immediately, whenever it was dismissed, and which invited me to call for “tech support” to fix my system. Sadly, I wasn’t tracing my every move, so I didn’t get a chance to discover how this ad was delivered, and could only rage at the company hosting the page.

This is one reason why I support ad-blockers

Clearly, ad providers need to improve their security. Until such time as they do so, a great protection is to use an ad-blocker. This may prevent you from seeing actual content at some sites, but you have to ask yourself if that content is worth the security risk of exposing yourself to adverts.

There is a valid argument to be made that ad blockers reduce the ability of content providers to make legitimate profit from their content.

But there is also a valid argument that ad blockers protect users from insecure adverts.

Defence – protect your customers from your ads

Finally, if you’re running a web site that makes its money from ads, you need to behave proactively to prevent your users from being targeted by rogue advertisers.

I’m sure you believe that you have a strong, trusting relationship with the ad providers you have running ads on your web site.

Don’t trust them. They are not a part of your dev team. They see your customers as livestock – product. Your goals are substantially different, and that means that you shouldn’t allow them to write code that runs in your web site’s security context.

What this means is that you should always embed those advertising providers inside an iframe of their own. If they give you code to run, and tell you it’s to create the iframe in which they’ll site, put that code in an iframe you host on a domain outside your main domain. Because you don’t trust that code.

Why am I suggesting you do that? Because it’s the difference between allowing an advert attack to have limited control, and allowing it to have complete control, over your web site.

If I attack an ad in an iframe, I can modify the contents of the iframe, I can pop up a global alert, and I can send the user to a new page.

If I attack an ad – or its loading code – and it isn’t in an iframe, I can still do all that, but I can also modify the entire page, read secret cookies, insert my own cookies, interact with the user as if I am the site hosting the ad, etc.

If you won’t do it for your customers, at least defend your own page

Here’s the front page of a major website with a short script running through an advert with a bug in it.

[I like the tag at the bottom left]

Insist on security clauses with all your ad providers

Add security clauses in to your contracts , so that you can pull an ad provider immediately a security vulnerability is reported to you, and so that the ad providers are aware that you have an interest in the security and integrity of your page and your users. Ask for information on how they enforce security, and how they expect you to securely include them in your page.

[I am not a lawyer, so please talk with someone who is!]

We didn’t even talk about malverts yet

Malverts – malicious advertising – is the term for an attacker getting an ad provider to deliver their attack code to your users, by signing up to provide an ad. Often this is done using apparent advertising copy related to current advertising campaigns, and can look incredibly legitimate. Sometimes, the attack code will be delayed, or region-specific, so an ad provider can’t easily notice it when they review the campaign for inclusion in your web page.

Got a virus you want to distribute? Why write distribution code and try to trick a few people into running it, when for a few dollars, you can get an ad provider to distribute it for you to several thousand people on a popular web site for people who have money?