While randomly looking over one of my older posts last night, I realized that the YouTube video that it was centered around had been removed, making it pointless. This is extremely annoying, not just for me but for any reader of that post.
I’ve since fixed it, but obviously any of my YouTube posts could have had their videos pulled. I hate the fact that there appears to be no way for me to get notification of this.

There seems to only be one solution: download the video, get a video player, and host the video myself.

Downloading the video: there’s a Firefox plugin called DownloadHelper that exposes the files and makes them available for download. Even without that, you could find them by inspecting the resources your browser is downloading, but this is a lot easier.

Getting a video player: Flash is really the only option right now. The first open source one I like is FLV Player.

Hosting the video myself: a touchy area. While it’s incredibly annoying that YouTube will pull videos for bad reasons, their hosting means they deal with potential legal threats.

Bandwidth is an issue also. I don’t get that much traffic, but probably enough such that even a relatively minor surge in popularity for a medium-sized video would pose problems. This is related to the issue of removal, too: if some video that I’m hosting gets pulled from YouTube, but is still on my site, and people searching for it end up here instead, that would be both a significant amount of traffic and an invitation to legal action.

This isn’t a problem merely with video. A lot of my posts, and a lot of blog posts in general, refer to articles hosted elsewhere. Occasionally I just point to them, and in many cases the article is critical to the post. If those external sources disappear, then those posts are almost as useless as the video posts with no video.

I look at the internet from an archival point of view, rather than as a mishmash of ephemera. I’ve always hated linkrot, and do my best not to contribute to it. This is why any URL that has ever worked on my website in its existence will still return something. Unfortunately, my view is clearly in the minority, and most people either don’t know or don’t care about this.

Linkrot is internet pollution. It’s bad when it’s due to abandonment, but this is more or less unavoidable: if your site disappears, that’s that. It’s a lot worse when it’s due to carelessness, particularly during reorganizations. If you write an article and make it /article/article1.html, then two years later move it to /posts/post1.php, all of the broken links that will show up where other people linked to it are your responsibility. You’ve just done your bit to make the internet a little less usable.

Larger organizations doing this are even worse, because they have a larger impact, and because they should have at least one person on their staff who knows better.

(A recent example: Wizards of the Coast have just redesigned their card search/card information service, which is great, but in doing so they didn’t bother to make sure their old URLs got redirected, so this link leads to a 404 despite the fact that it would be really easy for them to add a redirection rule that takes care of the problem.)

I know linkrot going to happen all the time, and again this raises the issue of hosting content myself. This reflects one of the problems with copyright law in general: I’m not trying to make money off of someone else’s work, I just want to avoid having my posts become nonsensical in a couple of years if their references go away.

My desire to copy those sources myself, and make them available, is an archival one, based on the urge to preserve information. But if I were to host them myself, I would be committing copyright infringement. Even if I were to only copy and host the source materials that have disappeared from the net, I would still be infringing. I suspect that a significant number of the copyright holders in those instances would either want their material kept around or not mind, but I would have no way of knowing which those were, and even a small percentage of litigiously-inclined unfriendly holders would make the whole thing risky.

Avoiding articles which become useless, and fighting linkrot, are both clearly positive for the larger culture. Copyright, a monopoly whose existence is predicated on the idea that it’s a good idea for the larger culture, doesn’t care about this, and this is yet another example of how it’s harmful. A minor one, but one whose general outline is repeated over and over.

I’ll have to look at my rights and liabilities according to the DMCA more closely, and see whether or not it would be reasonable to try to archive a bunch of this stuff, and to make it available if the source material disappears, and what other complications might be lurking. I also have to see about automating the process of finding broken links (easy for links in general, less easy for embedded media) and building both that, and the concept of substituting locally-hosted copies of source material for broken external sources, into my blog.

This entry was posted
on Monday, May 18th, 2009 at 22:04 18 May 2009 and is filed under Blog.
You can follow any responses to this entry through the RSS 2.0 feed.
You can skip to the end and leave a response. Pinging is currently not allowed.