2010-12-31

I had to write a short essay about an emerging issue in US-China relations and what I think should be done about it. I figured I'd post it here too. Short story: we all need to learn Chinese.

--

The next 10 years will see the beginning of the end of the US' "free lunch". As the standard of living continues to improve in China and as economic reform and access to information continues to spur growth, wages and prices will rise, which will cause an increase in the cost of goods manufactured in China, and much of this cost increase will be passed on to the American consumer. The increase in the cost of consumer goods may make continuing to import goods from China unsustainable for some industries. Unfortunately the US has already lost (or never developed) the ability to manufacture certain goods and materials in quantity, and has long relied on cheap manufacturing in Chinese factories. Chinese economic growth is therefore likely to cause tensions between China and the US.

Meanwhile China has started investing heavily in outsourcing cheap manufacturing to Africa and other developing regions, so it is likely that China will emerge as the next super-consumer country, and with an emerging middle-class and much greater purchasing power than the US (and maintaining trillions of dollars of US debt), the rise of China will likely drag the US into economic doldrums.

The traditional business and economic approaches to address this problem will all of course be pursued (investing in emerging Chinese markets, exporting Western brands to China and/or developing multinational business conglomerates). However I think to truly stay relevant, the US needs to focus on teaching Chinese language and culture to every school student the way that every Chinese school student is taught English language and culture, and the US government needs to focus on setting up an extensive network of student exchange programs with China and other Chinese-speaking countries. By exposing school children to Chinese language and culture, the next generation of business leaders, political leaders, scientists and engineers will be enabled to work alongside Chinese counterparts rather than simply competing against them while the economics of scale turn in China's favor.

2010-12-14

I just got an email from Gawker Media stating that their login details on Lifehacker, Gizmodo etc. had been compromised and a database of 1.3M usernames and passwords was being distributed via Bittorrent. Naturally I went and found the database and downloaded it. I extracted the subset of passwords from the file that have already been cracked, and uniquified and generated counts. You can download the list at the end of this post.

I have written about the dangers of using one password on multiple accounts before, and when I used to work at a company where I had access to a massive password database, I was shocked to discover how many people use really weak passwords -- like a first name or a number like 123456, or the word "password".

The leaked Gawker data contains the following explanatory text (along with a ton of leaked private chat logs between Gawker executives, and other juicy stuff):

After gaining access to gawkers MySQL database we stumble upon a huge table containing ~1,500,000 users. After a few days of dumping we decided that 1.3 million was enough.

Gawker uses a really outdated hashing algorithm known as DES (Data Encryption Standard). Because DES has a maximum of 8chars using a password like "abcdefgh1234" only the first 8 characters "abcdefgh" are encrypted and stored in the database. If your password is longer than 8 characters you only need to enter the first 8 characters to log in!

YA DONT SAY!! :D?

Because of this we were only able to recover the first 8 characters of someones password! If the password is 8 characters long there's a good chance that it migt be longer than 8 characters! But still, there's 1000's of people using 1 - 8 character passwords for us to have some fun with!

We managed to crack ~200,000 hashes, if you want the rest of them cracking DO IT YOUR ****ING SELF! >:3 (censored)

So ~200,000 hashes were cracked out of 1.3M by de-hashing (actually 188281 hashes were cracked, producing 91688 unique passwords). I assume that the 189k passwords that were cracked are somewhat representative of the rest of the database.

I ran some basic statistics on the password database because I was interested in seeing the distribution of password usage. Here is a plot of the usage count (out of 189k cracked passwords total) for the top 50 passwords:

Here is the same plot with a log Y axis and with the rank of all cracked passwords shown on the X axis:

Basically the top 5 or so passwords are used by a ridiculously high proportion of users, and the top few thousand passwords are very common and therefore very easy to guess using a dictionary attack.

DOWNLOAD LINK: Curious to see the passwords of all 189,000 users? Here's the whole password list with counts for each password. It's a .tsv file (tab-separated values), you can load it into a spreadsheet or text editor. (This file doesn't contain names or usernames, just the password info. If you want usernames you'll have to go get them yourself.)

UPDATES:

Ranks and accidentally-stripped leading zeroes fixed.

Highlighting one of my replies to a comment: if you have to even ask if your password is on this list, it's probably not secure enough!