GeoCities Project

Upon the news of the closing of GeoCities by Yahoo, Archive Team initiated the GeoCities Project, a coordinated effort to rescue as much of GeoCities' data as possible off the to-be-decomissioned GeoCities servers. This project was begun in April of 2009, and continued throughout the summer of 2009 up to the closing date of October 26, 2009 by Yahoo. A list of Frequently Asked Questions about this project was generated and is available Here.

Parallel to our efforts (and in conjunction with them) archive.org began a major "deep crawl" of GeoCities to add to their wayback machine. The page for their project is here. Please note that Archive Team and archive.org are 100% separate entities, with different approaches to the project of saving data and history.

It can not be stressed enough how many people were involved with this project - some preferred to be behind the scenes, while Jason Scott continued his habit of being a complete media hog, getting a lot of the interviews and face time with people asking what was up. But there were dozens of people involved, and they supplied weeks of time and effort to find efficient ways to download all of this data before it was removed.

Technical Details About GeoCities

These are now-defunct facts about GeoCities, culled from various sources, intended to provide some technical context for the arrangement of GeoCities that were discovered during the harvesting phase of data.

GeoCities Neighborhoods

Before the acquisition by Yahoo, GeoCities used an unusual organization method for its userbase: Neighborhoods. Separating the subject matter of the pages by taste, neighborhoods with names like Area51 (Science Fiction and Fantasy), Nashville (Country Music), Augusta (Golf) and others allowed for an easier time of finding subject matter the browser was searching for. It helps to give context that search engines as the modern world knows them did not exist in such force.

A neighborhood would have up to 9,999 accounts underneath them, with the numbers representing the user's "block". Over time, GeoCities added "Suburbs", which allowed an expansion past 9,999 users; these would have names like "Vault" and "Cavern" under the "Area51" neighborhood. A URL would then be available in the form of www.geocities.com/NEIGHBORHOOD/SUBURB/XXXX.

The Various Names and Incarnations of GeoCities

Originally called Beverly Hills Internet, the company opened up free web hosting in 1995 after a beta period. [1] It renamed itself to Geopages, and then GeoCities. After its acquisition by Yahoo, its name was changed to Yahoo GeoCities, which is what it remained until its demise.

The Size and Amount of GeoCities Accounts

GeoCities would provide a limited amount of space for its users to build websites, although this amount grew over time. While the most famous is about fifteen megabytes per site, the number was actually much more variant and changed through different amounts over its lifetime. This is an attempt to find citations of the size from various sources; it is clear from the various points of reference that different people got different deals through GeoCities over the years, especially with regard to paid versus free hosting.

This small size explains the usual look and feel of GeoCities accounts, as users were naturally restricted in what items they could have on their pages, and would lean towards simple graphics or utilizing hotlinsk to build their look.

Yahoo's Site Explorer showed 23M html pages in Yahoo's index as of April 29th, 2009.

Tips n' Tricks

Although simple directory listings aren't accessible for users' accounts, you might be able to obtain Apache-style directory listing for their subdirectories. For example, by stripping off the page filename for http://www.geocities.com/nenehs_world1/discography/homebrew.html, we can obtain an index for the subdirectory http://www.geocities.com/nenehs_world1/discography/; the benefit of this is that there may exist files which are not linked internally or externally, so crawlers are not made aware of them. Unfortunately, it seems many users do not organize their content into subdirectories, instead preferring to dump all files directly into the user directory. Also, they may have been good webmasters and provided a directory index which overrides directory listings.