Thoughts about technology, government 2.0, open data, collaboration, libraries and scientific publishing.

Posts categorized "Storage"

November 08, 2008

we've added 20% more photo servers and 50% more upload servers to process this year's Halloween traffic. We've also packed in 40 terabytes of additional storage (that's 40,000,000,000,000 bytes). By comparison, the words of all 20 million books in the Library of Congress could be digitized in about 20 terabytes of text. So we're making room for the equivalent of two Libraries of Congress.

The popularity of Facebook's photo application, home of more than 10 billion photos, has compelled us to think big for a while.

It's also an order of magnitude increase from last year. In an infrastructure posting from 2007, they said

We have:

* 1.7 billion user photos * 2.2 billion friends tagged in user photos * 160 terabytes of photo storage used with an extra 60 terabytes available * 60+ million photos added each week which take up 5 terabytes of disk space * 3+ billion photo images served to users every day * 100,000+ images served per second during our peak traffic windows

* More than 10 billion photos uploaded to the site* More than 30 million photos uploaded daily

Let's see 30x30 = almost a billion photos a month, so yeah, the math works out.

Flickr, photo-sharing darling (which always has had a buzz disproportionate to its actual number of users and photos, due its "look, folksonomy stuff" usefulness for the technorati), Flickr reports its 3 billionth photo.

Take the pinky of either your left or right hand, hold it next to the corner of your mouth and say “3 beelleeeeon photos…”

I don't think any of us quite know what it means to have photos at the giga-scale, using storage in the tera- and peta-scale. (It's quite possible evolution has not equipped us to think at such scales, although it appears the Maya were quite comfortable thinking about time at the mega- and even giga-year scales.)

UPDATE: As libraries, we also need to think about what this means in terms of our position as information managers. Pre-digital, the central library in your town or city probably did contain the largest (or nearly the largest) collection and concentration of information available to you. My organisation has millions of items (articles mostly) that it can make available to thousands of patrons. But compare that to billions of items and tens or hundreds of millions of patrons. Or compare that to home users who may have terabytes of storage and tens of thousands of photos. Can libraries still claim to be the centre of knowledge management expertise when they are tiny players in the digital environment?

March 14, 2006

Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.

Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. ...

Amazon S3 Functionality

Amazon S3 is intentionally built with a minimal feature set.

Write, read, and delete objects containing from 1 byte to 5
gigabytes of data each. The number of objects you can store is
unlimited.

Each object is stored and retrieved via a unique, developer-assigned key.

Authentication mechanisms are provided to ensure that data is
kept secure from unauthorized access. Objects can be made private or
public, and rights can be granted to specific users.

Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.

Built to be flexible so that protocol or functional layers can
easily be added. Default download protocol is HTTP. A BitTorrent (TM) protocol interface is provided to lower costs for high-scale distribution. Additional interfaces will be added in the future.

Pricing

Pay only for what you use. There is no minimum fee, and no start-up cost.

December 13, 2005

The Alexa Web Search Platform provides public access to the vast web
crawl collected by Alexa Internet. Users can search and process
billions of documents -- even create their own search engines -- using
Alexa's search and publication tools. Alexa provides compute and
storage resources that allow users to quickly process and store large
amounts of web data. Users can view the results of their processes
interactively, transfer the results to their home machine, or publish
them as a new web service.

March 03, 2005

The Digital Preservation Coalition (DPC) was established in 2001 to foster joint action to address the urgent challenges of securing the preservation of digital resources in the UK and to work with others internationally to secure
our global digital memory and knowledge base.

December 06, 2004

The big storage companies want you to buy big, expensive storage. While I laud their capitalist ambition, they have cranked out quite a bit of FUD over the years.

In meeting after meeting they told us we must have a SAN (Storage Area Network) for databases, for reliability, for whatever.

We listened politely and after extensive research, went instead with a NAS (Network-Attached Storage).

They will also tell you that SCSI and Fibre Channel (FC) drives are "enterprise class", whatever that means. However, lower-cost Serial ATA drives are now quite close to the specs of SCSI drives from a few years ago. Network Computing storage blog has some numbers: