Menu

Tag Archives: archive

In 1999, I started a link weblog to collect news about animated films. I updated it for a few years, until there were plenty of other good news sources from industry writers more qualified than I was to run such a site. I was just a fan.

The site was a homegrown MySQL database and set of PHP scripts. Somewhere along the way, I lost the archive, and never noticed that the site had broken until today. To make matters more difficult, I had blocked it in my robots.txt, so the Internet Archive copy (which existed, at least in parts) wouldn’t load cleanly.

I took some time today to piece it back together as a new static HTML file and (partial) RSS feed. I’ve preserved the original design and HTML tags. Fun rediscoveries in the HTML include spacer GIFs, <blockquote> to indent the entire page, and RSS 0.91.

I called it The Lightbox. It was just a linkblog. But now 16 years later, I’ve enjoyed skimming through the old posts.

“We weren’t able to find a way to keep Twitpic independent. However, I’m happy to announce that we have reached an agreement with Twitter to give them the Twitpic domain and photo archive, thus keeping the photos and links alive for the time being.”

This is much better than all those photos becoming broken links, but it’s still a sad statement on the Twitter ecosystem. Twitter threatened Twitpic, then Twitpic decided to shutdown, and in the end Twitter gets all the Twitpic assets anyway for cheap or no money at all. It’s a bizarre end to what only a couple years ago was a $3 million business.

Twitter is a big company with a lot of moving pieces. It shouldn’t surprise me that one half of Twitter is ready to sic the lawyers on Twitpic while another half wants to do the right thing for Twitpic’s customer base. Still, a bittersweet closing chapter on one of the first great third-party developers.

It’s great to see more people get access to their full archive of tweets from Twitter. In addition to just having a copy of your own tweets, it can be useful to go back and browse them by date, or search for something specific. I’d suggest putting the HTML version online as-is (mine’s here), and also checking out other apps that add a variety of different features on top of the basic archive.

Both of my apps — Tweet Library for iOS and Watermark for the web — can now import the .zip file you receive from Twitter. This file contains your full archive of tweets and retweets. Both apps can load the file directly from Dropbox, making it as simple as possible to get the tweets imported. And both apps are smart about only importing tweets that haven’t been stored yet, so you don’t have to worry about duplicates.

To import into Tweet Library, first download the archive from your settings page on twitter.com. Inside Tweet Library click on the blue arrow icon next to “Archives” and walk through the steps to authorize your account with Dropbox. Then copy the .zip file from Twitter to Dropbox → Apps → Tweet Library. It will show up in Tweet Library and can be selected.

Tweet Library is good if you want easy access to your tweets on the iPhone or iPad. You can search your tweets, create filters for them, and add tweets to special collections to share with others. It also doubles as a full Twitter client, with a timeline, posting, Instapaper support, and plenty more. Check it out in the iOS App Store.

To import into Watermark, also download the archive from Twitter and put it on Dropbox. You can put it anywhere, either in Apps → Watermark if you’ve already authorized Watermark to use Dropbox (for export), or in Documents or anywhere else. Then sign in to Watermark and click Account → “Upload all your tweets” to select the file.

Watermark is good if you want to expand your archive beyond just your own tweets. It indexes tweets from everyone you are following, creating a huge searchable archive over time. My own account in Watermark now has about 400,000 tweets indexed. Sign up or learn more at watermark.io.

I posted a couple months ago about my experiment to cut the price of Tweet Library in half. I’ve decided to make this decision permanent. Tweet Library is just $4.99 as a universal app for both iPhone and iPad.

Today I’m also releasing Tweet Library 2.2. This version gains a few improvements and bug fixes, but most importantly a big new feature: support for importing Twitter’s new archive format. It does this by downloading the .zip file you receive from Twitter directly via Dropbox, to make it easy to import your full archive of tweets. (Watermark has this feature too.)

Not everyone has access to exporting their tweets from Twitter yet, but I wanted to get this feature out as soon as possible. And I already have a version 2.2.1 submitted to Apple with more improvements to the import process.

“With Twitter there’s a rich corporation minding it. They can and imho should be funding their own archive. But with the historic blogosphere, dating back to the early-mid 90s, a lot of it is already gone. The need to preserve it, by independent historians and librarians, is greater than the need for Twitter to be publicly archived.”

I have a lot to say on this, and I can’t wait to share a new web project that I started recently which could play a small roll in blog backups. When I killed off my little app Wii Transfer, I did so to refocus Riverfold around preservation. I wrote:

“It also doesn’t fit into a new theme I have for Riverfold: apps that are all about keeping and remembering what matters. For Clipstart, that’s family videos. For Tweet Library and Tweet Marker Plus, that’s old tweets.”

Dave mentions libraries several times in his blog post. It’s no accident that the word “library” is in Tweet Library’s name; my ambition for this app far outpaces my coding speed. But blogs are a different problem, and they need something special — perhaps multiple solutions.

When we use Google everyday and mostly work with technology and related topics that are well indexed, it’s easy to forget the truth: the web is horribly incomplete. I’ve been doing some research for an upcoming podcast and it’s very frustrating to encounter huge gaping voids in the internet where history, audio recordings, and photographs should be. Somewhere out there is an audio cassette tape recording that I’d like to hear, but it will probably gather dust in an attic for the next decade instead. It needs to be even easier for anyone to put everything they have online so that it can be preserved and shared. Already I think the current generation raised on instant messaging and the web may not realize that there’s a whole world out there that is outside the reach of our keyboards. At least I know I sometimes forget.

The other part of the problem is linkrot. And not just 404s, but old links to obsolete file formats that can no longer be accessed. I can’t even count how many links to .ram files I’ve clicked that result in an error. When your content requires a special server (RealAudio streaming server software, in this case), it’s only a matter of time before that content itself will die.

Now, the good news is that a simple MP3 file and static HTML file with JPEG images will be around forever. It requires no special server software, no dynamic processing of any kind, and client software is so widespread and open that it’s a guarantee you can access it 10 years later. The only missing piece of the puzzle is reliable non-expiring domain registration and hosting.

The bad news is the rise of centralized web applications and data stores. What happens when YouTube shuts down? Remember they burn through huge amounts of cash for bandwidth each month and seem to have few options for becoming profitable. I feel better about Flickr, because they get it, but “Yahoo! has been known”:http://www.manton.org/2002/07/yahoo_mail.html to not treat data longevity seriously.