Last week, Michelle Sherman outlined the legal obligations and emerging case law regarding social media and eDiscovery. Once an organization has internalized and put these considerations into policy, there is then the practical issues involved in actually preserving social media content and archiving the material. No matter which vendor performs this role, there are a number of important factors to keep in mind.

There are obviously different approaches to a social media archive and a website archive, but these are the essential components of any forensically complete archive.

1. Use the API

The most accurate way to preserve web data is to make use of Application Programming Interfaces (APIs) available from social media properties. Software programs communicate with each other through APIs. Whenever possible, archiving solutions should involve an open authorization (oAuth) approach, which is the de facto standard used by Internet applications like Google and Twitter to share content. For example, Nextpoint uses Facebook’s Graph API (which uses oAuth for authorization) to crawl private Facebook profiles in addition to public pages, including friend lists, wall posts, and just about anything published on a page.

2. Don’t lose the metadata

As Michelle noted last week, Facebook offers a “Download a Copy of Your Facebook Data” through its Account Settings, but this feature misses comments and metadata fields necessary for eDiscovery. Any preservation of data for legal or regulatory reasons must include metadata for authentication and completeness. When archiving social media, consider whether the solution is actually pulling data from an API or simply taking a screen shot of what can be viewed in a browser. A screen shot won’t include metadata or other information that is often necessary in a lawsuit or regulatory hearing.

3. Capture original, unaltered source files

Social media content is dynamic, which means it is continually changing, and often looks different to different users. However, an archive that includes all original unaltered source files including HTML, images, video, CSS (style sheets), Javascript, linked files such as PDF’s, and any other data referenced or linked to on the page will provide the best possible preserved copy of any given page.
Additionally, a solution that captures content in real time preserves the most accurate copy possible, since content is being updated, changed, or deleted on an ongoing basis. A failure to catch those changes in can miss vital information or evidence.

4. Maintain a searchable archive

Always assume that captured data will have to someday be searched, reviewed, and produced to opposing counsel or regulators. Any workable solution must be able to perform all phases of eDiscovery review and production for any archived website or social media property.

5. Capture off-site content

Social media is often a series of reposted, linked, or retweeted content. A solution that cannot capture that information is of no use; it’s like archiving an email without the attachment. To archive a web page in its entirety on a given day means including content from related pages as well as third party servers such as video providers like YouTube or social media streams from Twitter.
Note that some archival strategies attempt to build and store a browse-able version of a website, they in fact have to modify the core source files by changing links and attempting to make static versions of dynamic website components. This does not result in an archive of the original, unaltered source files.Social media discoveryis now a clear and unequivocal legal obligation in many contexts, and courts have made it clear that an archive of such content must be forensically complete and accurate. For example, in Griffin v. State, the court overturned a murder conviction at least in part over a failure to authenticate evidence obtained from a MySpace profile, which had been presented as a printout.
Any social media archiving and preservation strategy requires some thoughtful consideration about what social media data to capture and preserve. But most importantly, any social media archive needs to be captured and managed in a forensically complete, searchable, and usable format. Otherwise, even the best litigation preparedness strategy falls apart.