Month: May 2009

CNN and BBC have both covered something called ‘Attack of the Zombie Photos’ – an experiment out of the University of Cambridge that tested to see how long a photo that was deleted by a website would really be deleted.

I found the experiment to be incredibly flawed and misleading.

The researches tested not the networks themselves, but internet cache copies. So a network could very well have deleted the image from their servers, but that change(deletion) had not propagated to their Content Delivery Network (CDN) in time — ie: the photo was primarily deleted from their servers, but a distribution copy on another (possibly 3rd party) server had yet to be deleted or timed out.

While the researchers did indicate that they were testing the CDN in a graph, their text barely made mention of it, their analysis none, and they routinely called the CDNs “photo servers” — as if they were the primary record. It seems as if the report was more about FUD ( fear , uncertainty , doubt ) than examining real issues.

You can view the report here : [Attack of the Zombie Photos](http://www.lightbluetouchpaper.org/2009/05/20/attack-of-the-zombie-photos)

My comment is below

> I think this experiment is great in theory, but flawed in practice and conclusions.
You are not testing to see if an image is deleted from the Social Network, but from their CDN. That is a HUGE difference. These social networks may very well delete them from their own servers immediately, but they are not exposed to the general internet because a (often third party) cache is employed to proxy images from their servers to the greater internet. Some of these caches do not have delete functionality through an API, the content – whatever it is – just times out after x hours of not being accessed. It also is often ‘populated’ into the cache by just mapping the cache address onto the main site. Example: http://cdn.img.network.com/a may be showing content for several hours that was deleted from http://img.network.com/a

>Perhaps you know this already – but in that case you are presenting these findings in a way that serves your point more than the truth of the architecture.

>In terms of your inference of the EU and UK acts, I wouldn’t reach the same conclusions that you have. Firstly, one would have to decide that an unmarked photo, living at an odd cache address with no links in from a network identifying it or its content, would be deemed “personally-identifiable data” — I would tend to disagree. Secondly, while the purpose of it may be to “share it”, it would really be “share it online” – and dealing with cache servers and the inherent architecture of the internet , I think the amount of time for changes to propagate after a request for deletion would easily satisfy that requirement. I also wonder if the provision to access ‘user data’ means that it is done in real time or in general. I’m pretty sure all these sites store metrics about me that i can’t see.

>Again, I will also reiterate that we are talking about ‘cached’ data here — and that the primary records have been deleted of the requested data. At what point do you feel that privacy acts and litigation should force the use to access / view *every* bit of data stored :
– primary record
– server caches
– data center/isp caches
– network ( university , business , building , etc ) caches
– computer / browser caches

> Your arguments open up a ‘can of worms’ with the concepts of network optimization. I wouldn’t be surprised if your university operates a server on its internet gateway that caches often requested images — would they too be complicit in this scheme for failing to delete them immediately ? How would they even know to do so ? How could the network operator identify and notify every step in the chain that has ever cached an instance of the image ?