It happens all too often these days: We turn on the news, only to learn that a new data breach has been discovered. Millions, tens of millions, hundreds of millions, even as many as a billion Internet user accounts have been compromised. It sounds terrifying, but how many of us truly know what that really means. What was the nature of this compromise? What kind of information is included in it? Did it include any of my information, and if so, what? Today we're going to talk about these data breaches, hopefully shine some light, and maybe even dispel some misinformation.

First of all, I'm just going to say this, because even though many of you don't care, some of you get quite passionate about this one point. What movies called hacking and hackers are really cracking and crackers. A hacker is a talented programmer and usually a good guy; a cracker is someone trying to break someone else's security and is usually a bad guy. So stop saying your password got "hacked". It didn't; it got "cracked". So now let's see if that actually happened to you.

One easy way to do this is to visit the website https://haveibeenpwned.com. This is a giant repository for all the usernames and email addresses — but not the passwords or any other sensitive data — that were included in all the known data breaches. Enter any usernames or email addresses you may have used to log into online accounts, and it will tell you if you were included. Chances are that some account you've used has been compromised at some point. This means that somewhere out there, there is probably a data file that includes your login credentials that some nefarious party might use. Most of us tend to use the same password on multiple online services, so someone might potentially have access to your PayPal or bank account.

Where did these large data files come from? When thieves acquire large tables of user data, they may sometimes sell them, usually for Bitcoins in some "dark web" marketplace. Smart thieves will monetize them once or twice, then dump them publicly to protect themselves; if all the thieves are using the same stolen data files, it's a lot harder for law enforcement to know who originally stole them. Hobbyist crackers may dump them publicly also. This is almost always the only reason anyone knows that these large data breaches took place. Most major breaches, such as highly public ones from Adobe, LinkedIn, and Yahoo, were only discovered when the stolen data was found freely available online.

What that data consists of depends on who it was stolen from and how it was formatted. It is usually a table of usernames or email addresses, plus a hashed password. Other fields are not often included, because they're of little value to the crackers. What they want is login credentials.

Let's talk about what a hashed password is. Let's say your email address is jane@yahoo.com and your password is 123456. If Yahoo simply stored both those terms in their database as-is, then it would be easy for anyone who gets their hands on the database to go to any computer and log in with that email and password. So what most companies do instead is to only store a "hash" of the password, and Jane is the only person in the world who knows that 123456 is the original password associated with that hash. A hash is a long alphanumeric string that's the result of a one-way mathematical function. This is a type of function in which some information is lost in the process, so that by the time we get the result, it's no longer possible to work backwards and see what the original password was. There are a number of hashing algorithms, and a very few that are the most popular and almost always used. These best algorithms result in hashes that are always unique, so that no two passwords produce the same hash.

When Jane logs in to check her Yahoo email, she still types 123456, but Yahoo's servers immediately make a hash of that, and search the database for the hash plus her email address. If a record is found that matches both, her login was successful, and it's never necessary for Yahoo to know her password.

Theoretically, hashing a password makes it impossible for anyone to get your password. But in practice, clever thieves have worked a way around this. Many people use common passwords, not only 123456, but also strings like password, qwerty, abc123, football, 111111, jesus, and others. What thieves do is run these through the popular hashing algorithms and create their own table of passwords and the associated hashes. This is called a rainbow table, and allows reverse lookups. They do this for tens of millions of potential passwords, including birthdates, names, phrases, the whole dictionary. Now, when they receive a list of hashes, all they have to do is search the rainbow tables for that hash, and in many cases, they will find Jane's original password. Rainbow tables are common enough that they've been indexed by Internet search engines, and if you do a Google search for a hash, you will get the original password in many cases. That should scare anyone into never using a common or conventional password.

What companies who follow best practices will do is store not merely hashed passwords, but hashed and "salted" passwords. This means that they add a text string of their own to the password before making the hash. If they salt passwords with the string xyzpdq, they hash 123456xyzpdq instead of just 123456. That's less likely to have ever made it into the rainbow tables. If it's a long random string, it's even less likely. If the salt is a combination of a long random string and something unique to Jane, even if it's just her name or email address, then we can say with reasonable certainty that her hash is safe from technology-based attacks. No rainbow table that was conventionally generated includes hashes for every possible combination of passwords, random salts, and personal names.

But even that wonderful hash is never going to completely protect your password. Although the hash cannot be reverse engineered due to the one-way nature of the algorithm, it can still potentially be cracked by brute-force attacks. Computing clusters, rooms full of minimalist computers of the same type used to mine Bitcoins, can be employed to crack hash tables. They look at patterns commonly found in passwords, such as combinations of so many upper and lowercase letters, so many digits, etc., often called mask structures; combined with dictionaries that have been pruned to the most commonly used words. Such computing clusters can find passwords matching common mask structures pretty quickly. There is always a point of diminishing returns; some percentage of hashes in the table won't get cracked, and those are probably the most unique passwords that are both long, unusual, and nonconforming to typical mask structures. But: be very aware of the fact that this computing power will always increase, and the security of all hashed passwords will always continually decrease.

So how did the thieves get their hands on these hashed password tables to begin with? The truth is we rarely find out how they were stolen. Many of them are inside jobs; a data center employee exports a database table onto a USB stick and literally walks out the door — data thieves actively solicit such activity, and pay for it. Sometimes thieves can gain access to a key employee's credentials and access the data that way. These credentials might be gained by social engineering; e.g., someone poses as a help desk employee and asks them for it; or by a technique called spearphishing, which is a targeted version of a phishing attack.

Normal phishing is not targeted. Billions of spam emails are sent out, usually appearing to be from PayPal or some financial institution, asking you to log in and verify your account. Of course the website that email takes you to is not the real one; it's a sham, set up to look like PayPal, and you enter your credentials, which delivers them directly to the thieves, who can then log into the real PayPal and send themselves money out of your account. Spearphishing is a targeted and refined version of this, going after specific employees who have the desired access, and can result in outside thieves gaining whatever credentials they need to access huge corporate data files.

There is very little you can do to prevent the thieves from obtaining these publicly-posted data files that may include your information. It will happen, and for many of us, it already has happened. But here is a short list of things you CAN do to minimize the harm to yourself in the future:

Don't fall for phishing emails. The newest web browsers will often alert you if you access a sham version of a sensitive website; but don't depend on that. Use your own bookmarks to access sensitive websites and don't rely on the links you're emailed. Look at the URL carefully and be sure you are where you think you are. Phishing remains the most successful way for thieves to obtain login credentials.

Don't use the same passwords on different websites. A data breach from an unimportant website like a forum might give a cracker access to your bank, if you used the same password. You might consider using password management software to keep track of all your new passwords.

Use really good passwords. Make up a nonsense phrase, include unusual words, include punctuation, numbers, odd mixes of uppercase letters, even emoji. Make it long. The chances of such a password being cracked are vanishingly small (today).

Use two-factor authentication on all sensitive websites that offer it. This is usually where they send a text message to your cell phone when you try to log in, giving you a second code you have to enter. This makes it much more difficult for crackers to log into your account even if they do have your password.

Never submit a login form that includes your password if your browser does not show a padlock icon indicating the form is encrypted. If you did, it would make your password vulnerable to a technique called sidejacking, which is where thieves examine unencrypted network traffic using special filters that look for login form submissions.

So that's a lot to be aware of. The idea of using a login and a password might seem archaic in this age of fingerprint recognition, but it isn't going away anytime soon. Get comfortable with better password behavior, and just know that you're in a war that will never end.