Research and Teaching Updates from the Web Science and Digital Libraries Research Group at Old Dominion University.

Saturday, February 11, 2012

2012-02-11: Losing My Revolution: A year after the Egyptian Revolution, 10% of the social media documentation is gone.

The Egyptian revolution on the 25th of January 2011 was unlike any other revolution in history because of the role of social media. Several blogs, Storify entries, web pages, channels on YouTube where created to document the revolution. Several books were even published documenting the 18 days. All of these contributions were made by the public, not historians, utilizing the tools of web 2.0. As a result of all these contributions we have an enormous digital content including thousands of posts, tweets, images, videos and sound files narrating and documenting the revolution. Unfortunately, at the first anniversary of this revolution over 10%of this digital content is already gone.

Websites like Twitter, YouTube, Facebook, Storify, 1000Memories, Blogger and IAmJan25 have allowed the public to document the events of the revolution in real-time. Storify, for example, allows the user to create a timed organized collection of tweets, links, images, posts, map locations or videos to create a story. 1000Memories on the other hand allows the user to keep the memory of a loved one after he/she has passed away by creating collections about them including photos, notes, testimonials, videos and other mementos. Iamjan25 is a website dedicated mainly as a hub for all the videos and images about the Egyptian revolution sent to the website administrators.

It is fascinating to read the amalgamated stories assembled from the tweets, Facebook posts, links, images, videos, map-taggings, etc. from the authors who were experiencing and documenting these events as they occurred. These social media contributions could give a great insight of what happened in the revolution and feed the curiosity of the readers by making them relive those moments with the authors.

Even in the period when the Internet and cellular services were shut down people still took photos and videos which they later posted in the social networks. You can often find videos and images documenting the same incident from multiple angles which reminded me of the movie "Vantage Point".

As an Egyptian in the WS-DL research group at ODU, web preservation of the Revolution is of particular interest. Fearing that the legacy was starting to vanish, we conducted an experiment to find the amount of missing digital artifacts related to the revolution. To measure this, we assembled a number of web sites that had a broad mixture of tweets, images, and videos contributed by the general public. Although we cannot say if this collection is representative of the entire collection of all such resources documenting the revolution, each of these resources was deemed important enough by somebody to have been included in a collection.

Experiment:

As stated earlier, there are several resources that curate the Egyptian Revolution and we want to investigate as many of them as possible. At the same time we need to diversify our resources and the types of digital artifacts that are embedded in them. Tweets, videos, images, embedded links, entire web pages and books were included in our investigation. For the sake of consistency, we will limit our analysis to resources created within the same time frame. For this purpose we tried to use the period of 20th of January until the 1st of March was selected as our temporal filter. Finally, to remove the possibility of transient errors skewing the results, we repeated our experiment 3 times over a period of three weeks before declaring a resource missing.

Our test collection consisted of:

Three stories from Storify, which contributed a total of 222 resources (26 of which are videos, 179 are images and the remaining 17 are links).

IamJan25.com website, from which we investigated all the pages containing user-contributed images (1225 images on yfrog and 1703 images on twitpic making a total of 2928 unique image links) and videos (2387 unique video links on YouTube).

Tweets From Tahrir book having 1118 tweets, 23 of which have embedded images.

1000Memories/egypt webpage and its associated resources.

In the next sections we elaborate each experiment we made in detail.

Storify:As mentioned earlier Storify is a website that enables users to create stories by collecting references to other media (for example: tweets, images, videos, links and more) and arrange them in a sequential time-based manner. For our experiment we collected stories posted by members in our investigation period from 20th of January until the 1st of March. In those entries we collected the number of missing images that we couldn't view as shown.

IamJan25:Some entire websites, like IamJan25, were dedicated as a collection hub of media to curate the revolution. The administrators of the website received selected videos and images for notable events and actions that happened during the revolution and they published them as two separate collections. Those images and videos were selected by users as they vouched for them to be of some importance and they send a reference to the resource to the web site. We examined all of those resources and found several missing ones as shown.

1000Memories:In PDA2011 last spring Jonathan Good gave a talk about his website 1000Memories. He mentioned a special page "1000Memories.com/egypt" which was created to remember the martyrs of the revolution and to describe their lives. It was a wonderful resource where the families and friends of the martyrs can post pictures, notes, videos and testimonials of their life. Unfortunately, this entire web site became unavailable sometime between the 18th of July 2011 and the 20th of January 2012 when we first started investigating it.

Using curl, we see that the web site returns an HTTP response of "503 Service Unavailable", although the main site is still available.

This exact example shows the importance of preserving the resource as it might, as in this case, get lost or disappear permanently.

Tweets from Tahrir:Several books were published in the last year documenting the revolution. To bridge the gap between books and digital media we picked a book entitled “Tweets from Tahrir” which was published later in 2011. As the name states, this book acts as a story formed by tweets of people during the revolution and the clashes with the past regime. We reviewed this book as a collection of tweets and focused on the tweeted media, in this case images, and tried to reproduce.

The image below is from a page in the book showing a snapshot of a tweet with an embedded resource. The embedded resource in this case is an image and is still present at the time of investigation.

While in the image below the tweet with the embedded resource, in this case also the image, is gone.

Reading the book you will notice there are several photos taken by professional photographers and websites. Those photos were presented in the book as a courtesy of the photographers. The rest of the photos in the book are images taken by several individuals and tweeted to the public during the revolution. We needed to state this difference to clarify our results. In our analysis we disregarded the professionally acquired images as those sources are most keen on preserving their own collections and they regulate their public dissemination. We focussed only on the images published by the public in form of tweets. We found 23 tweets in the book out of the 1118 having embedded images. We tried to trace the embedded link in each and reproduce those images. Here are the results of analyzing the whole book:

Tweets from Tahrir Results

Number of Images that we can't reproduce

Number of Tweets with Images

Percentage Missing

Total Number of Tweets

Tweets with Images Percentage

7

23

30.43%

1180

2.28%

Something worth noting is that while analyzing all the images in the tweets in the book we came across of a certain image and upon reproducing it using the link in the tweet it gave a totally unexpected result for an image of Miley Cyrus. After investigating further it turned out to be that there was a typo in the book in this tweet which lead to this unexpected result. The link in the tweet was to http://twitpic.com/3x3bxf showing a snapshot to Tahrir Square but the missing "f" at the end of the link and printed in the book made the link resolve to http://twitpic.com/3x3bx which was the Miley Cyrus tweet.

Summary:

While we cannot claim that the sample we investigated is a statistically significant sample of the entire web collection documenting the revolution, we believe that the number and diversity of the selected resources provide a representative sample. Also we exhaustively analyzed all the resources in this collection and we encourage the public to provide us with other similar resources that fit the profile so we can extend the investigation.

In the case of the book Tweets from Tahrir and the 1000Memories website they show us the huge importance of web preservation. Following a different preservation method, the book did a good job preserving the resources, in this case images and tweets but yet made it hard to reproduce. The snapshot in the Internet Archive also showed a great example of online preservation of important resources and in the case of 1000Memories to reproduce a completely gone resource.

The survival of the Storify entry or a blog or a media-collection like 1000Memories or IAmJan25 is based only on the survival of the resource on the provider website. Since those entries are provided by reference any change or loss in the original resource translates in a total loss in the entries of the story. This loss of the resource, let's say for example a video, could be caused by the owner deleting this video, the owner being banned or removed, the owner's subscription is finished, or even the publishing facility is completely down (for example, YouTube shutting down).

Here are the final results aggregating all the resources from all parts of the experiment and accumulating them according to type:

Accumulated Results

Resource Type

Number of Missing Resources

Total Number of Resources

Percentage Missing

Videos

341

2413

14.13%

Images

247

3107

7.95%

Links

2

17

11.76%

Total:

590

5537

10.66%

In conclusion, after only one year more than 10% of the media that we thought we have stored for future generations was gone. If the decay continued at the same rate and if we didn't do anything to preserve this digital heritage of the revolution in less than 10 years there will be no story to tell for the future generations and we will lose these magnificent collections that can show what thousands of books couldn't convey.