What are you planning to name your children? If you answered with any common first and last name combination, your child may be at a digital disadvantage. They will be condemned to a life of appending numeric sequences to their user names, picking off-brand Twitter handles, and choosing unrelated domains for their websites. Their Facebook and LinkedIn profiles will be difficult for new acquaintances to locate, Google results will misrepresent them to future employers, and their children will have an even harder time of it.

At least, that's what I thought several weeks ago.

It's easy to imagine that any given name you choose for a child will be common enough that somebody out there has grabbed the domain, the Twitter handle, the Facebook username, the Gmail address, and countless other digital identifiers. In this paranoid world-view, your child is relegated to being a second class citizen of the net (or whatever it becomes) simply by dint of having a common first/surname combination.

But how large of a problem is it really? To answer the question, I did some research on full name variation and came away surprised.

It is trivial to find popular baby names for a given period of time, but finding the frequency of full names in the US is another matter. The CDC and census bureau both don't have full name information for confidentiality reasons, and the only place I was able to find a list of unique first and last names was from a company doing greyhat Facebook advertising and data mining [1]. Here is their list of the 100 most popular first and last names on Facebook in 2009 [2].

I then ran these names through a bulk domain availability search (where the name "John Smith" turns into "johnsmith.com"). Not a single .com domain is currently unregistered.

Of course, that test doesn't tell me much except that at the very edges of the full name distribution, most of the domains are taken. The bigger question is how long the tail is. If 98% of first name/last name combinations are not contained within the top 100, then your kid will be fine unless you do name them John Smith.

Since I wasn't able to get my hands on a dataset for full names, I decided to approximate the frequency of firstname + lastname duplicates based on individual surname duplicate frequency in the US. Because surnames have fewer letters than first names and surnames combined, there are fewer possible combinations. This means that there should be a lot more duplicate surnames than duplicate first name/surname combinations. So, if it turned out there was a lot of surname repetition, I wouldn't be able to conclude much about how common first name/surname combinations are, I would just know that they are less common than the data I can see. However, if duplicate surnames turn out to be relatively uncommon, I could conclude that first name/surname combinations are more uncommon still, and disprove my suspicion that the uniqueness of my child's name is important to their digital future.

Among surnames, the top 100 most popular constituted 16.4% of the US population, the top 1000 accounted for 38.9%. That means that the majority of surnames lie in the long tail rather than the head of the distribution. We can therefore assume that the distribution of full names is even more skewed towards the long tail since the domain space is substantially larger. This assertion is supported by the Facebook data above.

This Wikipedia graph suggests there were around 350,000,000 users on Facebook in 2009. The total number of names in the top 100 comes to around 592,000, which represents just .1% of users on the site.

The conclusion I have to draw is that unless you name your son John Smith and your daughter Sarah Smith, they will have perfectly viable digital identities available to them when they graduate from college and need to start looking presentable to the rest of the world. So rest easy; I know that I, for one, want to name my first child Ellis. According to White Pages Names, it seems it is suitably obscure, and hey, I notice the domain "ellissaines.com" isn't registered. I think I'll register that ... just in case.

Special thanks goes out to the staff of the Alden Reference Library at Ohio University for helping me obtain this link and other vital statistics I used in this blog post! Specifically Sherri Saines (@bibliosanity), Tim Smith, Kelly Broughton, and Cary Singer.

[1] The company was later banned from scraping Facebook, which is presumably why their data is so old.