A few days ago, I launched a live case study where I created a 300,000 page site. One curious reader emailed me and referred to it as a “megasite.” She saw the value in creating these massive sites and was very interested in creating her own megasite. So, let’s discuss this concept further.

In my recent case study, I was able to create 300,000 page site (upon launch) by leveraging an API. In this article, we will explore another method of creating a value-added megasite. We will leverage publicly available datasets, instead.

First thing’s first. What is a dataset? With some help from Wikipedia, a dataset is defined as a collection of data, usually presented in a table. Each column represents a particular attribute. Each row corresponds to a given entry of the dataset. For instance, if we have dataset on cars, the columns can be “model,” “make,” “color,” “year,” and “license.” Then, an example of a row entry could take on the values “Accord,” “Honda,” “White,” “2009,” “TMWSW23.”

You can create megasite by converting that dataset into a website. In this situation, each row of your dataset will correspond to a page on your website. In other words, if our dataset of cars had 5,000,000 rows, you can convert that into a site with 5,000,000 pages–1 page for each car.

Now, on a high level, here is the 3 steps to creating a megasite that adds value.

1. Find a dataset.

A great source of freely downloadable large datasets are non-profit and government organizations. Here, I’ve compiled just a small list for you to chew on. Google around for more.

2. Narrow down on a subset of the data that people are truly interested in.

This is the hard part. The dataset contains a tremendous amount of data. You don’t care about all of that and neither will your site’s visitors. Your job now is to think carefully about this data and figure out what slice of it people actually find interesting. In essence, you are picking the niche focus of your site.

Here is an example to illustrate this step. About a year ago, some company created a dataset of all of Facebook’s public profiles. They made this available as a free download. I downloaded this and it was massive. It was several gigabytes!

Anyway, I’m not interested in all that data. And, I’m not gonna spend hours scouring through that data for what I am interested in. Here are, however, some random subsets of that dataset, which I would be interested in:

Single girls in Los Angeles

People who listen to Foo Fighters

People who watch the show 24

People who like sushi

Models on Facebook

People who attended Cornell and now live in Los Angeles

People who have over 5,000 friends

Distribution of where people are living across the US from my hometown

Distribution of where people are living across the US from my alma mater

Starbucks baristas living in Los Angeles

Actresses living in Los Angeles

But, because I was unwilling to spend the time dissecting that dataset, I won’t be able to browse the above listed subsets that are of interest. Here is where you come in.

3. Create a website based on this subset.

Countless people downloaded that Facebook dataset. However, few people actually explored it in any meaningful way. Why? To summarize, two main reasons. One, it contained too much information. Two, it wasn’t presented in a friendly way. The files were text files. When I opened one of the files in Notepad, it froze my laptop. What the…

You’ve narrowed down on a specific subset of that data. For example, you’ve narrowed down only profiles of single girls living in and around Los Angeles. Also, you are presenting the data in the form of a website–not as text files! People who visit the site can browse profiles like they would on Facebook. This is intuitive to them. Maybe they can filter by age, filter by district within Los Angeles, and search by interest.

This is where your site creates true value and convenience for your users. You are presenting data in a useful, focused, and digestible way to your visitors.

Best of all, we now have a megasite based on publicly available data. Content creation is always a pain, but we’ve bypassed it–all hundreds of thousands of pages worth of it!

Trackbacks and Pingbacks

[...] site. First, we leveraged an API as a source of autoscaling content. Next, pairing that with a US zip code database, we created a megasite of 300,000+ pages of unique content. Upon the site’s launch, we [...]