UEA Can’t Find Wahl Attachments

The main target of Phil Jones’ notorious email deletion campaign was Eugene Wahl’s surreptitious correspondence with Briffa in summer 2006, in which Wahl changed the IPCC assessment of the Mann controversy from that which had been sent out for external review. Wahl’s changes were contained in attachments to his emails to Briffa. The emails mostly came out in Climategate 1 (with some interesting additions in CG2), but the key attachments were not included in Climategate documents.

As Jones had requested, Mann forwarded Jones’ deletion request to Eugene Wahl, who, according to the report of the NOAA Inspector General in early 2011, then deleted the documents. According to a contemporary (2008) email from Jones to Jean Palutikoff, Briffa removed the emails to thumb drives which he took home.

In 2011, I submitted an FOI request to the UEA for the attachments to the Wahl emails. UEA refused this request, claiming that there were no copies of these attachments on university computers and that they were unable to/not obliged to search the backup server then in the possession of the Norfolk police. I had appealed their refusal on the basis that the UEA was able to request a search of the backup server. The Information Tribunal had accepted the appeal and pleadings had been exchanged up to final replies. In the very latest stages of this proceeding, the server was returned to UEA.

Without withdrawing their previous arguments, the UEA undertook to search the backup server for the Wahl attachments. They reported on this search on August 8, the report being co-authored by FOI officer David Palmer and Chris Collins.

They stated that they were unable to locate any of the documents.
“c:\eudora\attach\AW_Editorial_July15.doc”
“c:\eudora\attach\AR4SOR_BatchAB_Ch06_ERW_comments.doc”
“c:\eudora\attach\Ch06_SOD_Text_TSU_FINAL_2000_12jul06_ERW_suggestions.doc”
“c:\eudora\attach\Ch06_SOD_Text_TSU_FINAL_2000_25jul06KRB-FJ-RV_ERW_suggestions.doc”

The documents in question were listed as attachments in two CG1 emails (716. 1153470204.txt and 733. 1155402164.txt). These emails were lengthy threads; the actual emails containing the documents are CG2 -3241 and CG2- 2540.

They limited their search to Briffa’s directories (stating that earlier inquiries had “confirmed” that Briffa had not forwarded the Wahl email to anyone else) and to the earliest Briffa backup, said to have been August 2, 2009.

Within Eudora, attachments are stored in a separate subdirectory from the emails to which they were attached. Accordingly, a decision was taken to search for attachments in the area of the subdirectory which was relevant to Professor Briffa’s email account.

In the course of the initial request, internal review, and subsequent ongoing investigation by
the Information Commissioners Office, it was confirmed that Prof. Briffa would not have forwarded this email onto any other member of CRU staff so the focus of the search was felt
to be justifiably restricted to attachments to Professor Briffa’s emails. We searched the earliest known backup which dated from 2 August 2009 on the assumption that as all the emails predated that date, if the documents were held on the server, the earliest iteration of the backup would possess them,

They stated that the search of the forensic copy of the backup server was carried out by Chris Collins, Head of Research Computing, and was based on instructions and input from Mike Salmon, Faculty System Specialist, who had been the administrator in charge of the CRU server at the time of Climategate. David Palmer, Information Policy and Compliance Officer, and Iain Reeman, ICT Systems Director, were also present. The search was recorded by a KVM over IP system.

The search was described as follows:

The folder in question contained a series of backups in numerical chronological sequence, the earliest of which dated from Aug 2, 2009. The folder was searched three times using different commands each of which were specifically designed so as to be sufficiently general to capture the documents in issue. We felt that a direct search for the full filenames would not be appropriate as this would not catch minor variations in filename. By employing search terms that were contractions of the actual file names the search would reveal any and all documents with the search terms in the name of the document, regardless of what the ‘suffix’ to the search term contained.

The first of these searches returned no matches. The second and third returned matches to
other files which we confirmed by eye not to be the files in question.

They concluded:

the searches of the server indicated that documents 5-8 were not in fact physically contained on the server. Given that the server was in the possession of the
Constabulary at the time of the request and given further that it must be inferred that the
Constabulary did not itself remove or delete and documents, it must be inferred that documents 5-8 were equally not held on the server at the time of Mr Mclntyre’s request or at
any time thereafter.

I must say that I’m surprised by a couple of things in this report and would appreciate advice from readers familiar with the technicalities.

Obvious questions that occur to me: what precisely were the search terms. However, on the basis that the search terms were sensible, does it make sense that the emails (attested in the Climategate dossier) still exist while the attachments don’t?

The report does not specifically confirm whether they had been able to locate the emails to which they attachments were attached. Given that we know that Briffa had deleted his Wahl correspondence in 2008, it seems counter-intuitive that they would be in an August 2, 2009 backup.

And the date of the earliest Briffa backup – August 2, 2009 – is a real surprise. Most of us had assumed that at least some of the backups had been from a much earlier period. August 2, 2009 is only a couple of days after the “Mole Incident” (as a result of which CRU re-arranged their FTP access.) It makes one wonder whether they inadvertently did something to their system that exposed something.

I don’t see any reason – at all – to restrict to certain folders or periods of time. This aint rocket science.
Just more of the same.

I accept the logic of looking in Briffa’s directory, rather than Mike Hulme’s. I’m more interested in comments and advice about why the attachment attested in the Climategate email wasnt in the attachments. The August 2, 2009 date of the earliest backup is really a surprise to me as well.

That looks like a Unix command. If the mail program is Eudora, it is probably a Windows server.

Still, it isn’t hard to do a recursive search of all directories to find the file. You can do it in Windows Explorer using the “Search” menu or from a DOS prompt with something like:
dir /s *something*
where “something” is a part of the file name.

Steve – look at CG2 email 0626 and you will see that backups taken were normally incremental backups rather than full backups. Note the 0626 email also dates to very early August 2009 and implies a significant upgrade to Backuppc had just been installed. I am wondering if that upgrade or the following one mentioned was a trigger for subsequent events.

“Hi Tim,
It’s gone back to 16 mins today.
I upgraded BackupPC last week. My guess is that they’ve changed the algorithm for
incrementals so it effectively did a full backup and then some. Once it did a real “full
backup” the new algorithm works properly so incrementals go back to a more manageable
size.
I’ll be doing a further upgrade in a few days, so it may do it again. Fix would be to
force a full backup.
Mike
Tim Osborn wrote:

Hi Mike,
my cruto4 (Windows) machine incremental backups seem to have gone from 15-45 minutes
previously up to 281 minutes on Wednesday and been running since 10am today and still
going! Any idea what’s happening?”

The features of Backuppc also need to be understood. My expectation is that if they are able to locate an original email which contains the attachment via the Backuppc CGI interface then the attachment should be available for retrieval. That is provided that attachments are backed up. Did they locate any of those emails from CG1 or CG2 using the CGI interface? You should consider posing that question.

This email indicates that BackupPc had forced a new full backup for Tim Osborne and that it is probable that the full backup for Keith Briffa on 2nd August was generated in the same way. If the emails and attachments had already been deleted from Keith Briffas harddrive then they will not have been stored on the backup taken on 2nd August 2009.

We do know that the emails must have been retained on one of the backups because they were released in CG1 and CG2. These emails at least must therefore have been retained on another full backup that was created before or during 2006 but which was still live for incrementals during August 2006.

I have to respond to the UEA in the present proceedings early next week. I would appreciate any efforts to distill the commentary to very specific prescriptions for the sort of search that would be effective. (Some commenters have been more helpful in this respect than others.)

Steve – I would suggest that the search included a folder/directory named OldAttach in the root level or the C:\ of the Briffa backups. I would also suggest the same search of the Tom Melvin computer backups.

Melvin made copies of all of the attachments older than Oct 2008 from Briffa’s PC to his own laptop c:\OldAttach in Oct 2009. This made copies of the attachments out of the “area of the subdirectory which was relevant to Professor Briffa’s email account. “

I can’t help with the overall technical summary but I would suggest detailing an argument that is some version of this:

1) the attachment IS there on the backup server (explain technical reasons for believing so),

2) therefore, they have NOT searched adequately

also, before the issues of the email backup server even arose, was it really established adequately that the document(s) in question truly did not exist somewhere else in the CRU server files, i.e., did they previously do some very limited search(es) and then say “nope we don’t have it”

Because for my own practice, I almost always re-save any significant document that comes to me via email as a distinct document in a relevant folder…. unless there is reason to believe Briffa and others never did so, the doc may in reality exist in some other backup server even if UEA/CRU never identified it as such.

Steve – I believe the key is to establish which backup the emails were retained on. I cant see that a full backup taken on 2nd August is the correct one to be looking at.

There must have been a full backup from earlier than August 2009 with incrementals which captured the CG1 and CG2 emails you are specifically mentioning in the header post. There is a possibility that any earlier backup was deleted from the server before early November 2009 as part of maintenance activity but not before RC/FOIA had harvested the content. A large number of new full backups for all users would have had a significant impact on the server capacity so may have led to follow on activity to free space. UEA should know this but may be reluctant to admit if this happened.

A different backup is taken for each PC specified in the BackupPc Hosts file so you should ask if they have considered the possibility that Keith Briffa had other PCs (or drives) backed up.

I will repeat that the fancy searches being specified elsewhere will not work. BackupPc renames each file for the purposes of storeage in a table. We dont know the revised name used by BackupPc and I dont know if it would be possible to independently work it out.

Only one copy of any file is retained on the backup server with hardlinks to each backup where it belongs. I dont know if an IT specialist could easily search in the table to identify if the files are still present.

Backups were done using the software BackupPC, which reportedly cross-links multiple backups so that only one copy of a file is retained in multiple backups. According to HP’s examination, the UEA backup server contained multiple backups of 60 computers.

UEA originally argued in the Keiller case that it was cost-prohibitive to locate a single well-dated email (Jan 19, 2009) from Phil Jones to a known scientist at Georgia Tech. They stated that HP had estimated to them that it would take one day per backed-up computer and that HP had estimated to them that it would take “over a year” of work to carry out a search.

They stated that Phil Jones had used four different computers, with multiple backups of each, 22 distinct backups in total for Jones alone. They claimed that a search of these 22 backups would cost over £25,000.

A search of these 22 backups for the email from Phil Jones to Georgia Tech was carried out in June, with UEA reporting that they had been unable to locate the email.

Since it seems that the BackupPC program may change the directory path (by introducing “f”s) as part of its “name mangling” process, and given the lack of any reason to assume that all attachments would have remained in Briffa’s Eudora attachment directory, I would think that a proper search would include:
1) All directories from the backup of Briffa’s computer (not just Eudora or Eudora attachment directory).
2) Searches for fragments of the file names not including directory names or file suffixes, thus searches for filenames containing a subset including:
“AW_Editorial_July15” or
“AR4SOR” or
“Ch06_ERW_comments” or
“Ch06_SOD_Text” or
“TSU_FINAL_2000” or
“ERW_suggestions”

Steve – is it possible to get them to offer up for each researcher you are interested in the date of each full backup taken and to also indicate which machine the backup was for?

Phil Jones may have had 4 computers but probably not all at the same time with new computers replacing older ones.

I would normally assume that email activity for any user would be restricted to just one or two machines over any period of time.

They will need to locate the latest full backup with incrementals for the computer being used for emails that pre dates the emails you are interested in.

The earliest backup taken after the emails is also of interest but for Keith Briffa this appears to be the 2nd August 2009 which was probably well after he deleted the emails.

If BackupPc was installed later than the date of the emails you are interested in then they will need to look at the earliest backup for that computer.

If the researcher subsequently switched to a new computer then emails may have been transferred across so the first backup for any new computer would also merit a search.

The other issue in play is the use of themed inboxes as they would in theory need to search each inbox. If you could pursuade them to offer up the names used then it may be possible to assist them with inbox selection and as a result reduce the searches needed.

Steve – reading through the link I provided this morning you should note the following warning

“Also, you need to be careful about incrementals vs. fulls since incrementals
will include only the most recently changed files while fulls might
not include the latest version if there are subsequent incrementals. ”

This suggests latest version of files could appear on an incremental backup but not on a full backup. You should ask for details including timeline of incremental backups for researchers you are interested in.

Due to the way BackupPc works it is unlikely that a new set of full backups for existing users would have led to a space issue. That is because only one copy of any file is held regardless of how many backups are taken.

Whilst it not possible to rule out older backups being deleted before November 2009 it unlikely that this would be due to the BackupPc upgrades in August 2009.

There are any number of ways of doing backups, so I’ll have to guess as to what was backed up and how.

It appears the “backup server” which the police had was really just used to hold copies of files from various PCs, so I’ll work on that assumption.
With a little luck it was running some form of Unix, which makes the searching easy.

On the backup server:
To find files by name:
find / -iname \*july15\*

should find AW_Editorial_July15.doc and any others with july15 in any mix of upper and lower case in the file name.

should find any references to Ch06_SOD_Text_TSU_FINAL_2000_12jul06_ERW_suggestions.doc and anything else with ch06_sod_text_tsu in the contents, again case-insensitive.

If the backup server is running Windows, they would need to enable file sharing on it, and then mount the filesystem on a Linux (or Unix) box using samba.
The searches are the same, just starting from the mount point for the backup server.

The “real” emails and attachments should have been backed up from the mail server, which was probably managed by a different section of UEA. The attachments would have been MIME encoded in the body of the email, with the name of the attachment in there as well.

On the other hand, if the backup server is configured like most of the system I’ve worked with, it just copies files from the client PCs to its local disk, then writes them to tape. It should keep a record of every file backed up to tape, the tape it was written to and the date it was backed up. Depending on the backup software used, finding a particular file can range from trivially easy to extremely difficult.

>I have to respond to the UEA in the present proceedings early next week. I would appreciate any
>efforts to distill the commentary to very specific prescriptions for the sort of search that would be >effective. (Some commenters have been more helpful >in this respect than others.)

The most appropriate way to obtain the attachments for your 2011 FOI request would have been to retrieve them from backup or archival copies of the e-mail server as opposed to the PC of the recipient. This would have ensured that the file names and files themselves are in the original “as sent” form. The wording of the refusal claim “that there were no copies of these attachments on university computers” might not necessarily mean that copies did not exist on off-line (tape) and/or off-site backup or archival facilities. I understand that this point is not the focus of this thread and may have been discussed elsewhere, but it is important to stress that e-mail server infrastructure (including backup and archives) and related IT policies should have been examined first in response to the FOI request.

The latest attempt to search the backup server seemed to be targeting backup copies of the user PCs only. As others pointed out, Eudora client software does not keep the entire copy of the message as received. Instead attachments are separated and stored the file system. They become regular files entirely under the control of the user. Attachment files can be renamed, moved deleted or modified. Therefore looking just for the file name is not sufficient. Effective search should instead focus on all meta-data. This can include file system level meta-data (such as creation/modification time stamp ranges) and meta-data stored in the file itself (such as author, creation date for Word documents).
My final point is about the way the recent search was performed. Assuming that backups were done using BackupPC software, the actual content on the backup server file system is in special internal “file system database” format specific to BackupPC. It is NOT just the file and directory copy of the original PCs filesystems. File names may be mangled and file attribute data is saved in special additional files. BackupPC software knows about this format and hides its complexities when the user interacts with the application via web-based interface.

The statement “The folder in question contained a series of backups in numerical chronological sequence, the earliest of which dated from Aug 2, 2009. The folder was searched three times using different commands…” seems to imply that BackupPC application was NOT used and the internal database was searched directly. This means that the search command needed to account for the details of the internal storage “database”. For example, in order to find c:\dir\file.doc one may need to look for something like fc\fdir\ffile.doc (note mangled file name prefixes). The wording “By employing search terms that were contractions of the actual file names the search would reveal any and all documents with the search terms in the name of the document, regardless of what the ‘suffix’ to the search term contained.” is not specific enough to indicate how exactly did it account for BackupPC internal file structures. Looking for filesystem meta-data would be even trickier as it is stored in separate files encoded in BackupPC-specific way.

Going back to your question on effective search against backup server. I think that the easiest way would be to use the original version of BackupPC application to restore the entire user’s directory into a temporary location and then use meta-data searches (timestamps, author) to look for specific docs in this temporary location.

I agree. There is no valid reason for limiting the search space. It is not as if this is a manual effort, where significant human resources are expended and scaled to the size of the area searched. Its an automated search.

My active project data storage is a few TB spread across three or four physical drives, with a similar archive space. When I misplace a file, I may start by looking in the ‘most likely’ directory, but I very rapidly transition to ‘search all attached drives’. It make take a half hour for such a search to run. So what? I frequently locate the culprit file in a place where it was not ‘supposed’ to be. Files get moved inadvertantly, and sometimes get moved intentionally to locations chosen for reasons that make sense at the time but that aren’t intuitive when a search is initiated by someone else and/or at a later date.

The intent of a FOIA request is “give me this info if you have it” not “give me this info if you have it in that drawer”.

Re: Speed (Aug 10 10:36), In fact, it is always in the last place you look. Not sure about Eudora, but some other systems purge attachments after a time, if they have been marked as read. Presumably the assumption is that the attachment has been downloaded for use and is no longer needed taking up space.

Re: the question of whether an attachment can be deleted while the email survives, this is certainly possible. Most email software allows the user to ‘detach’ an attachment (in other words, to save the attachment to a file and remove it from the email); I often use this with large attachments so as to keep the size of my mailbox below my quota. I would expect the email message to show some indication that this had happened, though. I don’t know how Eudora handles it in particular.

I’m no IT specialist – but it seems to me that the email comes into the Mail Server first – before it gets to the desktop of the reciepient? And that it is the Server copy that gets backed up? Not some version modified by the user?

My guess would be that the email and the attachment were deleted. If I understand how Eudora works, this would have deleted the attachment, but the email itself would not be physically removed from the mailbox, but rather marked as deleted. So someone directly accessing the mailbox file would see the marked-as-deleted email that Eudora itself wouldn’t display to you.

My first suggestion would be to make sure they look for the attachments in other areas besides the email directories — if they actually backed up user files. It wouldn’t surprise me if the email was received, opened, and saved somewhere convenient outside of email. Then the request to delete came through, the email was deleted and the email attachment was deleted. Later, FOIA gets ahold of the mailbox file, which still contains the email (now hidden from view within Eudora because it’s marked “deleted”), but not the attachments.

Not sure how likely it would be that someone saving an email attachment called “Ch-6_SOD_Text_TSU_FINAL…..doc” would just name it the same when exporting it, or if they’d change it to “Secret Ch6 changes.doc” or something else. If the backup server contains user files, it’s be worth looking for “Ch06*.doc” files to see if it was saved.

Steve asked: Would the archived email still identified a deleted attachment?

As a long time user of Eudora, I can state that the answer is “Yes”.

Eudora as normally installed and configured strips the MIME encoded attachment off the incoming email and replaces it in the body of the email with the “attachment converted” marker. The normal place to drop the attachments is in a sub-folder of the configured user data folder. Eudora is extensively configurable, so that default is easily changed if desired.

When the user browses the mailbox in Eudora, they see markers on messages that have attachments. If the attachment is still present in the folder where Eudora dropped it, then the marker will be “live” and can be clicked in various ways to open the attachment, open the file, and so forth.

If the file has been deleted from that folder, the marker will be dead, and will usually also change to indicate that the file is missing.

One incentive for moving important attachments out of the folder is that when the message is deleted, and subsequently emptied from the Trash mailbox, the associated attachment file is also removed. That makes sense for spam, but is a real annoyance if you forgot to save an important document before deleting the content-free cover letter.

IMHO, searching only in a user’s email folder was the wrong thing to do. They should have searched the entire disk image for near matches to the file names.

A question: if the attachment was detached, would it still show up as an Attachment in the backed up email?

Yes. Eudora would add the text to the original email when the email was created. If the files were then moved from the c:\eudora\attach folder to somewhere else, like copying them to the USB sticks as Briffa says he did the files would no longer be there. This part of the response:

Accordingly, a decision was taken to search for attachments in the area of the subdirectory which was relevant to Professor Briffa’s email account.

Sounds to me like UEA’s trying to game FOI legislation again by limiting their searches to somewhere they probably know they’d be unlikely to find them. Few people who are even slightly organised would leave them sitting in the default attachment folder on the local disk. FOI’ng the search criteria used may show how they were trying to game the process by restricting the search, which I doubt the ICO would be happy with.

IIRC, Eudora keeps the text portion (To/From/Subject/Body/etc) of every message in a given “mailbox” in a single giant text file for that mailbox.

As others have noted above, attachments are separated from the message text upon receipt and stored individually in a separate attachments directory. Thus, it is quite straightforward and typical even for non-experts to copy, edit, move, delete or otherwise access each attachment outside the Eudora program.

However, performing such operations on the text of a message outside the Eudora program requires one to directly edit the (often quite large) text file containing all the messages from the given mailbox, which might be daunting for the novice.

Thus, as R. Berteig notes above, it’s quite typical for attachments to have been moved to other directories, breaking the link from the original email. As he and others have suggested, it’s important to check all disks/directories, not just C:/Eudora, that contain Briffa’s files to look for the attachment files.

clivere makes important points above regarding the backup process.

In order to do a proper complete search, one must:

1) Know at least a semi-unique fragment of the file’s name, or,
know a semi-unique fragment of the file’s contents.

It seems that the UEA search assumed that the file names were not changed. Based on my personal practice while using Eudora, this is likely but not certain.

A content search with a fragment of the file’s contents would take longer, but be much more likely to find the file even if its name had been changed. Compression, encryption or non-text-file formats may complicate such a search, though.

2) Search the entire set of disks/directories that might hold the file for the semi-unique name fragment or content fragment.

The UEA search clearly did not do this, but should have. When performing such a search, it’s important to make sure that the file names and contents searched have not been changed, truncated, or encoded by the backup program (for example, as described above by clivere). If so, one will have to perform the search using the backup program’s provided search function, or one will have to restore all the backed up files to a filesystem and search that using normal tools.

Assuming the UEA wish to perform a good-faith search at this point, it seems it would be quite useful to have someone with relevant expertise (perhaps Tall Bloke?) present at any further search.

I think the issue is that the search isn’t being conducted by an adversarial organization. I’m guessing that if they gave the mirror copy over to some (how shall I put this..?) group skeptical of their intentions – they would somehow find it.

Perhaps we are looking for the wrong place. According to my conclusion, CRU staff and PhD students had PC’s, connected by cable, making a small network with a network server. Their server was maintained by Salmon and was located at ground floor level in room 2A. All mail, after been sent or read by the network participants, was archived on that server. The mails had a nine-digit code assigned chronologically by the University mail server. The order in the archive was semi-chronological and in 2009 the file consisted of 220.000 mails.

Besides the University domain there was a domain on the server where backups of the connected PC’s were stored. These included the Eudora attachments.

Not only are there various versions of .doc (modern .docx is compressed XML), but the search method itself is highly OS-dependent. Vista for example doesn’t search file content by default. I have windows 2003 servers on which the search functionality appears to be broken and I have to use WinGrep to find content within any files.

Word 97-2003 Docs typically preserve text, so they are searchable by simple search tools (including grep, for example), unless it has special formatting w/in the searched phrases.

Within Eudora, attachments are stored in a separate subdirectory from the emails to which they were attached. Accordingly, a decision was taken to search for attachments in the area of the subdirectory which was relevant to Professor Briffa’s email account.

This is a very narrow assumption. If the backup is of briffas computer, why assume the attachment is in the Eudora client folder which is just the default location. An obvious course of action would have been for briffa to move the document to another directory–say one dedicated to the ipcc or … Who knows what his approach to organization was.

Why limit it to Briffa’s backups. Search the entire drive. Maybe Jones and others had backups earlier than August 2009. Seems logical that the emails and attachments had been deleted by then and would not be found, since Briffa had removed and/or deleted them by the time of that backup.

I work in a large health care provider network with a vast email system. After 90 days all emails not in your ‘trash’ are archived. If the archived email has an attachment the attachment still shows in the email header/information but the attachment doesn’t actually exist in the archive, it is discarded by the archiving process. So you can retrieve the original email from archive, it will show that there was something attached to it originally but you will be unable to retrieve that attachment.

The UEA hasn’t taken the position that there are no attachments to look through. They seem to have lots of attachments. The question is why the Wahl attachment went AWOL without the email going AWOL (if indeed the email is in the backup.)

What EJD said does not imply that there would be no attachments to look through only that emails older than 90 days would be archived and could potentially lose attachments. There could be lots of recent attachments to look through, but none older than a certain date.

EJD, you talk about backup of a mail server, but here the question is: How did Eudora and how did Briffa handle this locally, on his hard drive? One would need to know what Eudora version he used and how it handled local storage of emails and attachments on the local hard drive.

Haven’t used Eudora for over one and a half decades, so there isn’t much I can contribute except to ask everybody to focus on what Eudora version might have been used, what it does out of the box, and what options the user has (change configurations or move attachments, e.g.).

I see now they only searched by shortened variations of the filename. Could it be their system changes attachment filenames to match some sort of indexing system unique to their mail-server? Why not also search by date?

I agree that the search is limited. I noted from the Eudora folks that file locations are also settable: “There are good reasons, though, why you might want to explore the folder where your Eudora datalis stored; in particular, the email attachments that you’ve received are there, in the Attach sub-folder (unless, of course, you’ve told Eudora to store attachments elsewhere). To make this easier, Eudora creates a shortcut to your Eudora data folder in the Application Data folder’s parent folder, which is typically “C:\Documents and Settings\”username”” – from the Eudora knowledge base.

They really need to search the entire database given the time it would take to set up the hard drive, the search time itself would be trivial.

In answer to the question “does it make sense that the emails (attested in the Climategate dossier) still exist while the attachments don’t?”:

It is easy to delete individual attachments from the attach directory while preserving the emails (combined in in.mbx and the corresponding index in.toc) in the eudora directory. As attachments arrive, I copy them to a subject-oriented directory tree, then periodically delete the older attachments. If I go back to the host email and Eudora no longer finds the corresponding attachment, Eudora places a large red X to the left of the name of the attachment. Of course, this does not address server backups.

If I wished to delete a document that someone sent me, while preserving the host email, I would go first to the attachment directory, then look for copies stored elsewhere in the system.

The email program does not keep track of file deletions or file move operations on the machine so there is no guarantee that the pointers in email message and the files referred to by those pointers are in synchronization. However, Acton said that there were no deletions, having asked Briffa and Jones. Even if there were deletions, incremental backups would probably have caught the files prior to their deletion. (It is the intention of backups to save people from accidental data loss after all). Consequently it is quite likely that an examination of the backup server will reveal if and when specific attachments were deleted.

In my opinion, no credence whatever can be placed on Acton’s testimony to Parliament. I raised this in my appeal and the UEA said that Acton’s testimony was entitled to parliamentary privilege i.e. he could make untrue claims with impunity.

“Could someone directly confirm whether the pointers in the email continue to exist if an attachment is deleted? I can see how this might happen but would like confirmation from someone who knows?

Yes, in Eudora they do. However, they only reveal that the attachment is not in the directory in which it was originally downloaded, not whether it was deleted, moved or renamed. It would not be strange if Briffa moved the attached file to another directory if he was using the file actively. This is why it would be important to search the whole drive for the file and not just the attachment directory.

I mentioned it above, but I think it’s worth repeating that if Eudora does store mail in “mbox”-format files as Wikipedia claims, an email can be marked as deleted and yet have its entire contents (not pointed-to attachments) still existent in the mbox file until some kind of “compression” or “cleanup” operation is performed. This is similar to the files on your computer: when you delete a file on your computer, the data on your hard drive is not deleted, it is simply added back into the “free to use this space” list and is no longer directly accessible by you. That’s how file-recovery software works on your hard drive: the data’s still there until it’s overwritten.

So my question would be: if FOIA took backups of mbox-style mail files, then broke the contents up into separate .txt files (easy to do), some of the emails could actually be emails that had been deleted and were no longer visible to the person running Eudora.

Could someone directly confirm whether the pointers in the email continue to exist if an attachment is deleted?
Affirming others – Yes.
I stopped using Eudora a couple of years ago after using it for many years. I could always delete the attachment with no impact on the text in the email. Eudora would not notice the deletion until I tried to access the attachment from inside Eudora. Even then, it did not change the prose that stated where the attachment was originally stored.

In the case of Eudora, it is much easier to separate the attachment from the email than is the case for say Outlook or Thunderbird.

You mentioned Thunderbird. IIRC during one of the inquiries they mentioned the emails were in Thunderbird. I don’t know whether this was for the purposes of the inquiry or if CRU themselves had migrated from Eudora to Thunderbird.

If the emails had been successfully moved from Eudora to Thunderbird I wonder whether they would they have remained in the Eudora folders or been moved or copied to a Thunderbird folder.

1) The email contained attachments
2) Eudora stores the attachments in a specific place
3) This place gets backed up
4) The attachments are therefore in the backup
5) If a search of the backup does not retrieve the attachments then either:

A) The search was inadequately performed
B) The files were deleted from the backup

There are no other possibilities. “They are not there” is simply not a valid response, as unless the files were deleted for some reason, they ARE there.

I have used Eudora for many years and emails are always stored separately from attached files. You can define for yourself where the attachments are stored and these are standard directories (folders in Windows terminology) allowing you to perform any kind of file manipulations (delete, copy, rename etc.).

The issue appears to be where the emails were received originally – if on Briffa’s PC, he would have had full access to change where attachments were stored and to move, delete or rename attachments after there were received. The search as listed would not be sufficient if I was looking for a file in my own archives and I don’t see quite why it is necessary to restrict the search to certain folders – speed of filename searching is quite up to task. More problematic is whether the filename has been changed and thus whether the search terms they used were sufficient.

Eudora can also be configured to leave emails (and attachments) on the mail server either indefinitely for a certain amount of time, and also whether attachments are deleted from the server) when the email to which they were attached is deleted. In my experience from the early years of University email (up until 2005), IT departments were always complaining about storage capacity and the standard policy was for emails (and attachments) to be deleted from the server after downloading. The relatively trivial cost of very large storage capacity may have changed policies, as would the various legal issues of accountability, but I doubt that such things would have been followed terribly conscientiously in 2008/9 – certainly not give the other information we have about UEA.

Not my area but I think that Eudora is just the local client. It sounds like it unbundles the attachment on receipt by the user and locates it on the local C: drive.

The information given indicates where to look on the users PC.

If the attachment had been unbundled on a communial file server, I would expect to see a server name in the extended filename but I might be mistaken.

Organisations according to size may have some beefy component of the mail network. I think there is another level between client and the big mail switches the drive the system.

If they have a communial mail agent of some sort that would be the bit to back up as you would have a complete trail suitable for internal or external audit. Most large organisations have to do something like this if they want to protect themselves against their staff.

By and large, formats are only defined at the interfaces, where interoperability is required. How mail servers, agents or clients store data is their own business and hence searching is best done using the mail systems builtin applications.

If any backups are to be useful I would hope there would be a better way of finding things than the method described.

I run whatever is the modern equivalent of Outlook Express which is a client and provides the same function as Eudora. I still don’t use a browser interface. If I need to find something I have to use my mail software to find it. The mails are not stored individually in files. That would waste huge amounts of disc space. Each mail folder has one file whith holds all the mail under that folder name e.g. Inbox, Sent, or Junk etc. That file has all the emails and attachments stored in a proprietry fashion. That structure could be anything. There is no guarantee that it wouldn’t do something cute with the data in general or filenames in particular.

I am horrendously out of date on this, mail preceeded the internet, the stored formats were highly proprietry and interoperatability was limited. My speciality was message switching which predates even such internal mail systems but was an open and intergrated system similar to email. We stripped the address headers equivalent to xxx@yyy.com from the body, timestampded the body and stored it in one format linked it to the header and stored that in a different format. Without using our application tools there was a near zero chance of locating a particular message or its body.

Now I come to a point that has puzzled me. Were I to try to do a climategate on my little PC and its email system. I would transfer the proprietry files and use Outlook to extract the ones I wanted after the act. I could extract them on this PC zip them up and transfer the zipped file but that would need much more time.

It is too late to cut a long story short but were it I who was doing the looking I would make sure that there wasn’t a big chunky file (not directory) call Briffa or just some xxx999 with a promising file extension, find the tool that could decode it and hopefully search through it. My personal email file “Inbox.dbx” is huge >100Mb and stuffed full of headers, text, and attachments and I could try an search “in it” for a filename but I have no guarrantee that such data would be in a searchable format or stored contiguously.

Perhaps someone will come along, or already has, who knows for certain how Eudora does things but more importantly what sort of mail agent an organisation the size of a University or any organisation subject to FOI would or should have, how to identify its files, backups and archives, etc.

Based on what I know about “unixy” email systems (and almost all email systems are probably very unix-like under the hood, i.e., at the mail server level), I have a sneaky suspicion that their search methodology (as they described it) won’t work. I wonder if they did a search for a known good email and associated attachment before they did their search for the attachments that are the subject of the FOIA request. That way they would know if their search methodology actually worked as they think it should. If this simple test is unsuccessful, then they would know that they need to figure out another way to find attachments. I would run this test on an email with attachment intact (i.e., not saved somewhere), and on another email with attachment that was saved to some “random” directory in the user’s storage space (local hard drive, network storage, etc.). Only after I’d successfully run those tests would I be somewhat confident that I’d be able to find an email attachment on any other email. Anyway, that’s how this former Unix SysAdmin and LAN Administrator’s Engineering-trained mind would approach the problem.

Steve McIntyre Posted Aug 10, 2012 at 12:55 PM ‘In my opinion, no credence whatever can be placed on Acton’s testimony to Parliament. I raised this in my appeal and the UEA said that Acton’s testimony was entitled to parliamentary privilege i.e. he could make untrue claims with impunity.’

Steve,

I am not an expert on ‘parliamentary privilege’, but I think you may be giving a somewhat misleading impression when you state ‘… i.e. he could make untrue claims with impunity.’

It is true that if ‘parliamentary privilege’ applies (‘if’ as I am not sure I trust anything that UEA state) my understanding is that it protects one from defamation / libel. I also understood that ‘lying’ to parliament was a serious offense.

[On the other hand, I left England over 35 years ago, perhaps my memory fails me and/or perhaps standards have changed].

I am surprised that their earliest backup available is from August 2009; our local primary/middle school has backups earlier than that. With minimal money in a third world rural area they beat a UK university that is the “3rd for facilities and 5th overall in the Times Higher Studies experiences Survey 2010”. World policies formulated in the IPCC have to depend on this amateurism.

I think the first relevant question is whether the original email which contained the attachment is on the backup save they searched. There is a question of type (incremental or full) not specified. Once a backup containing the original email is found go to the index and see if that email has an attachment associated with it. If so pull whatever is identified in the index of attachments.

But that is not the end of it.. The email was sent from Wahl to Briffa. When I get an attachment I’ll save it to my base working area or a special subject subfolder. So a search of “documents” with file dates in all of Briffa’s subfolders near (perhaps a week or two after the sent date/time) to when the email was received should be done as well. Any documents found should be listed with file type, time, date and size, and all preserved on the server. You can’t predict what the name is – I often save an attached file with a file name more meaningful to me.

A couple of other people have given the answer, but I’d like to do the same from a slightly different perspective.

Emails are received by an email server. An email that contains one or more attachments is stored in that email server as a SINGLE message in a multi-part MIME format that allows the computer system to identify what is the text of the message and what are the attachments.

Eudora is an email client that retrieves messages from an email server. It can either take a copy (preserving the message on the email server) or it can remove it from the email server once the it has retrieved it.

Eudora then splits the multi-part MIME format message into multiple parts. Attachments are placed in the configured folder for attachments and the text of the message is kept in the Eudora mail file. Once this has happened, attachments can be manually moved or removed, breaking the link with the text of the email in Eudora.

So… the question becomes where did the ClimateGate emails come from? If they’re from the Eudora mail file, then it’s legitimate that there’s an email without the corresponding attachment (as the Eudora email client separates them and makes them different entities). If they’re from the email server itself, however, then the original attachment would still have been there.

As for backups, they appear to be only looking at the backup of the Eudora client, and not the central email server. As I don’t know precisely what was on the server in question, I don’t know if that’s reasonable or unreasonable. There are a lot of permutations as to how backups and emails can interact, and if they don’t have a backup of the email server from the appropriate time, but only a later backup of the email client, then there’s not much they can do.

As for where did they search, then I agree with others that searching ONLY the Eudora default attachment folder is unlikely to a good search. When I used Eudora, I used to move things from that folder regularly, as otherwise it became very difficult to find anything. It is quite probably that if the file was important, it would have been moved from that folder after Eudora placed it there.

Heuristic: the last Climategate harvest was on Friday, 13 November 2009, in the afternoon. All emails harvested were in Jones’ inbox and outbox only. Probably, this was done at Jones’ office computer at CRU, while he was absent. Some ‘Agatha Christie’ work but it may be helpful.

Do we know exactly what operating system their e-mail server (and presumably back-up server) run?

If it’s Linux, BSD or some other variant of UNIX, I would expect each e-mail is stored as a single text file in MIME format with a filename that is just a number (or some combination of letters and numbers which is not terribly meaningful to a human). Searching the hard disk for that file name will not locate anything. You have to search the contents of the e-mail MIME files for the file name. This can be done quite easily using grep although it may take a while to complete. For example:

I’m not aware of any Unix e-mail server software that stores attachments separately and certainly not with the attached file name. They may exist but if they do I haven’t come across them. I wouldn’t expect Windows mail server software to operate this way either. If they do store the attachments separately, the file names are probably just going to be seemingly random combination of letters and numbers, or else they’re just stored in a big database file.

I do not know how the Eudora Mail Server works (I know others), but this seems strange.

“Within Eudora, attachments are stored in a separate subdirectory from the emails to which they were attached. Accordingly, a decision was taken to search for attachments in the area of the subdirectory which was relevant to Professor Briffa’s email account. ”

Is this a true statement for a Eudora Mail Server (not the Eudora Mail Client)? On the mail sever Eudora separates out the attachment from the email body? I can see the client doing this (as others have explained) but not the server, a waste of time and computing resources.

I think I asked the wrong question. I assumed they were searching a backup of the mail server itself. A cursory look at the EIMS documentation the answer is:

“Email Archive Filter – This filter forwards a copy of all messages received using SMTP (which includes outgoing messages from users) to an account named archive in the default domain. That account can then be accessed directly, forwarded to an alternative address, or the Inbox could be aliased to another IMAP account. ”

To archive emails in EIMS you really end up just sending a copy of the email to some other email account – by default named archive. This would not break down into “the area of the subdirectory which was relevant to Professor Briffa’s email account. ” This would just be a big honking single email account named something else.

So I now I have to ask – is the backup server that they searched a general server that store backups of each individual computer files? If so, this becomes a different animal completely. Tom Melvins email (3939.txt) step 3 has real meaning – look in Toms backup folders.

Or, is the email server something other than EIMS, and the back server is really a backup of the email server? If so then there is probably no Attachment folder. I know there was a lot of breakdown of the emails themselves and the Eudora client aspect, but I do not remember an answer to the email server(s) in use by UEA.

Specific answers to your questions: it is perfectly possible for the attachment to be lost or deleted, or just for UEA’s IT staff to be unable to find it despite looking in good faith. The problem is that we know they’re pretty incompetent, and use bad tools wrongly configured, so it’s hard to rely on anything having been done properly.

I would suggest putting in an FOI request for the binary contents of the backup server’s disks. It’s a single (very long) string of ones and zeros, and you want the whole thing. Then you can build your own virtual version of the server and search yourself.

Keep in mind that when a file like the missing attachment is “deleted”, the file is not actually removed from the hard disk, rather the file header is removed from the disk’s directory (whether FAT, NTFS, EXT, etc) and the area that the file resides on is marked as available to be re-used. Until its overwritten however, it can easily be located and recovered by forensics software or even undelete utilities, if a) you know what you’re doing, b) you what you’re looking for, and c) you actually want to find it . . .

Could software used to detect plagiarism be useful in this case? Wahl could have sent Briffa text that was incorporated into AR4 with little or no change. The server could searched for documents containing phrases similar to the key passage from AR4. You’d really like to know if the highly specific information in this passage was drafted by Briffa himself or if it came via email to Briffa (presumably from Wahl).

“McIntyre and McKitrick (2003) reported that they were unable to replicate the results of Mann et al. (1998). Wahl and Ammann (2007) showed that this was a consequence of differences in the way McIntyre and McKitrick (2003) had implemented the method of Mann et al. (1998) and that the original reconstruction could be closely duplicated using the original proxy data. McIntyre and McKitrick (2005a,b) raised further concerns about the details of the Mann et al. (1998) method, principally relating to the independent verification of the reconstruction against 19th-century instrumental temperature data and to the extraction of the dominant modes of variability present in a network of western North American tree ring chronologies, using Principal Components Analysis. The latter may have some theoretical foundation, but Wahl and Amman (2006) also show that the impact on the amplitude of the final reconstruction is very small (~0.05°C; for further discussion of these issues see also Huybers, 2005; McIntyre and McKitrick, 2005c,d; von Storch and Zorita, 2005)”

There are free utility tools that greatly simplify such a search. “cathy” is one. Give it a few minutes to create an index of every filename, size and date on your system (and all accessible network drives). Then easily search for any portion of any name. With the right settings it will immediately show up to 10000 matching files. Sort by date or whatever. (My Cathy database contains a few million files. It’s not a difficult or time consuming challenge.)

It is NOT hard to find the file.

They clearly limited the search on purpose, not to speed up the search nor to make an intractable challenge doable.

No, clearly they limited the search to lower the probability of finding the file.

The appropriateness of their search methodology depends on the kind of backups, format of data files and filename storage scheme used on the backup server, and the desire to search by file name or by file contents. If they can extract a record of a particular user’s local hard drive contents from a backup archived on the server, then an exact file name should be entirely knowable as present or absent from the backup without resorting to any non-exact search terms.

However, it is possible for files to have more than one valid file name (for instance, an MS-DOS 8.3 character file name, typically containing something like “~1”, that maps to a long file name without those character count limits; or, the ability of Unix and newer Windows operating system versions to create symbolic links or file junctions, where a named file system entry is created to point to the contents of a file of a different name or in a different location). If they are not using a search methodology or a backup mechanism that fully exposes original long file names to searchability, then a file system search could fail to locate a file using an expected original file name. This could be a reason to use general search terms to locate a file of a known name, rather than simply using the known name of the file in a simple search.

However, you can’t always get accurate search results for expected terms in file contents rather than file names. If a file is encrypted (an option with PDF files, for example), or attached as something like a .ZIP compressed file or an encoded version of a file (Mac BinHex transmitted to a Windows user, say), then a plain text search term might not locate the contents. If the file uses styling such as bold or italic within a particular expected phrase, a plain text search may not locate the contents. Certain file formats also insert document structure information in unexpected places, even in the middle of a text with no style changes (as I’ve seen Microsoft Office do), and certain file formats store information within fixed size blocks of the file without worrying that a particular phrase might be split up into non-contiguous blocks (as I’ve seen desktop publishing programs do); those types of files can’t be conclusively searched with a plain text search term.

Bottom line: The search methodology’s sufficiency is certainly testable if the details of all the software layers involved are well known. I would have expected a testably sufficient search of a non-incremental backup archive of a user’s local hard drive content to have located the Eudora text that mentions the existence of a file attachment with an exact final file name (without any specific folder path), even if the exact file name is not present in the backup. I would also have expected a search of a network mail server to turn up such text with the exact final file name, even if for some reason the attachment itself is not stored on the server with that final file name.

There could be a different answer regarding Eudora’s behavior with file names of attachments on a Mac compared to a PC. Depending on the file system APIs used to store a reference to a file on the file system, it is possible for a Mac application to locate a file that has been renamed or moved to a different folder, by storing only an integer file system identifier. The new folder path and file name can be recovered, at least on the original drive. (It seems I’ve only seen references to Windows versions of Eudora here, though.)

Maybe I don’t have the terms of art correct, but in my mind there is a difference between the process of ARCHIVING and one of an automated BACKUP…? I recognize that I may not be as helpful as others in the specifics of which folder to search, but when I search for documents i NEVER mess with folders. So. It is a red flag for me.

It seems to me that the Mail Server would need to backed up – daily for some set period of time – and then archival copies of some lesser duration written to tape or DVD for archival purposes. These would be in the form of the ‘incoming’ mail – and in the case of an inadvertent deletion – can be called upon to restore. Maybe UK Universities don’t do this… If someone can mess with this set of files, it really isn’t a ‘backup’. Retention guidelines should govern this set of files.

Then there are the individuals Archival records – which record the incremental files and folders of he users directories – as things are pulled off to manage disk space. These are Archives in my mind, not backups (despite what the programs are called ). They clearly have these – but are hiding behind an estimate in the amount of time to fully search these data. MrPete has it right – this is entirely silly. Irrespective of the name of the file, one can also search upon a string of text in the paper.

Sorry to be late to the game and I did not review the prior comments, but it seems to me odd that the FOIA request server, holding emails and documents made public, only goes back to 2009. We know the emails go back to before 2000, and many of the files go back well prior to 2009.

So we seem to have a bait and switch here. I would renew your request, make it general for all versions of the file (attached and otherwise) and include a file we have from Climategate 1.

It seems to be they looked in Eudora only, which may be a thin slice of time. But the files being collected for FOIA – which is clearly what was stolen, there was not scavenging of selected files on the server – prove there is another place on the server they decided to avoid.

And that BS about shortened file names is ridiculous. You should request a listing of all instances (and paths) of that file in the server as well.

If you feel it is worth the time.

Whoever copped those files in the first place should now come out with a listing of what they have folder and name wise, and if they have this file, bring it out to show the world what was going on.

Would it be possible to get a review of UEA’s claim that a search would cost 25k and on the basis of their obvious incompetence/obstructionist attitude get the backups and archives of backups searched by a competent third party?

So the use of a thumbdrive was known back then? If you checked even further thru archived websites of electronics, you would find thou they use eudora for their system, a old 2 gig thumbdrive could have held some other operating system, another e-mail system, I like thunderbird portable, linked thru another account. In other words he wasn’t doing this from a school system, they would have no record, because the server would be different. It’s called cover your tracks, they cannot prove your lying then.

This would take a mere few minutes to accomplish, well under the 18 hour allotment, making it hard to be ruled as vexatious. The results may be lengthy, but searching through it could be crowd sourced here. There may be a directory/ file named “Wahl-Amman Ch06”, for example. A future request could be made based on more detailed Directory/ file names that are obtained.

This is a great place to start. I cannot find the reference right now, but it was noted in some document from the inquiries, that the UEA CRU BackupPC was (mis)configured to never delete backups. Thus, the Host log file for each of Briffa’s computers ( and I would include all of Tom Melvin’s as well, but especially his laptop from Oct 2009) should give a road map into which backup numbers most likely contain the attachments (by dates of the emails – 7/21/2006). These Host Log files appear to be a flat text file residing on the BackupPC server itself and should be easily retrieved.

From the Host Log file, a subsequent request to search the mostly likely backup numbers due to dates can be made for each computer. (Or if only one response is allowed, these instructions can be spelled out).

If the statement about the BackupPC is correct (and given that the sCRUBACK3 server has #3TB of compressed data on it, it seems likely) the FOIA files are on the CRUBACK3 server, even if Briffa deleted them from his PC any time after a full backup occurred after July 21 2006. I am not sure about deleting them with only incremental backups occurring between receipt/backup operation and deletion action.

re: UEA statement that ““there were no copies of these attachments on university computers” may obviously be an evasion, since storage media are often not part of a “university computer” i.e., tapes, detachable hard drives, thumb drives, etc.

I can’t help with describing the ways in which the FOI search ought to be conducted, but clearly they will limit/evade achieving the goals of the FOI whenever possible. Perhaps the follow up request needs to specify every kind of storage medium they might have used, unless it has to be restricted only to that one back-up server. I know that even for a smaller system in a company I worked with we had mirrored free-standing hard drives onto which the back-ups were copied, a system which replaced a previous back-up system onto tape drives.

The common denominator for all forms of mail is that it is sent to recipient(s). Whether is travels by special emissary, pony express, royal mail or internet, the best place to find it later on is where it wound up, not where it started.

I used to work for oil companies. We operated and drilled wells. We had lots of data and studies relating to each well. We had consortium partners, each entitled to receive copies of the entire suite of data. Later on well trades took place, and the trading partnership also had entitlement to all the data. Years later still, there may well have been office moves, archive storage transfers, management changes, or even full company takeovers, mergers, and etc. As they do, embarrassments occur, things occasionally get sent to or received from competitors by mistake, get lost or become illegible. The solution is always resolved by honest admission and appeal to those original recipients to restore the archive. Now we have the internet and digital data this really should be a semi-automatic reflex action.

If UEA really were serious about locating something when their own searches fail to find it, their first thought should be to appeal for assistance to the original recipients.

Steve: are you not familiar with the story? UEA doesn’t want to find anything. NOr will the recipients be of any help. They were part of the plan to delete documents.

Need to ask someone who knows exactly how outlook works with attachments. Had a case where a secure delete program had been applied to a computer after the emails with attachments had been sent, but the emails themselves had first been archived with a view to removal and retention, so had not been securely deleted.

Amateurs at work of course.

Result was in the recovered archive you could read the emails, but outlook does something odd with the attachments, they seem to be more like links than actual attachments. Unlike Linux email clients. Never did recover the attachments in readable form.

However, they have the server, which we did not, so you would think that what was on the server would be what the recipient got, which would mean readably encoded attachments. Dunno, find someone who knows how Outlook works in detail, and they can give you the answer.

Or simply. Search all of Briffa’s computer’s directories for the files. Limiting to the default eudora attachment folders is an obvious obfuscation.

At some point, the IOC should just realize,(be made aware) that UEA’s turning an FOI request into an obscure game of 20 questions is just not in the spirit of the law. The IOC should take possession of an image of the server, and provide real assistance in resolving dozens of these requests.

Since BackupPC does name mangling and we don’t know the search method they are using a proper search specification is difficult.

Backup file names are stored in “mangled” form. Each node of a path is preceded by “f” (mnemonic: file), and special characters (\n, \r, % and /) are URI-encoded as “%xx”, where xx is the ascii character’s hex value. So c:/craig/example.txt is now stored as fc/fcraig/fexample.txt.

Assuming that they are using something like find -name ‘filename’ the search would fail. If they were using find | egrep -i filename it might fail, depending upon the regex they used. If they searched for exactly “c:\eudora\attach\AW_Editorial_July15.doc” they wouldn’t find it since that would have been converted to “fc/feudora/fattach/fAW_Editorial_July15.doc”.

For this reason I’d specify all files in Briffa’s directory (not the entire machine) matching *[Ee]ditorial* (if using file system symantics) or “.*[Ee]ditorial.*” if using regex. I would specify both versions to avoid misunderstandings. You could consider requesting a listing of the files matching the specified patterns rather than the files themselves so that a selection could be made to determine which, if any, are appropriate.

Another approach would be to specify a limited range of dates and file types. Thus all .doc files within a two week period since you know when the email was sent.

There’s a Perl script called backuppc-fuse (I think) that unmangles the file names and restores the attributes. One would assume that Chris Collins and Mike Salmon are competent enough to use it for regex or grep arguments from the command line.

I’m coming in way late on this, and I haven’t used Eudora in years. How does it actually handle/save attachments? In the old days, attachments were often sent & saved in a compressed and/or archived format, which meant normal file searches wouldn’t work. You needed a utility that could read the archive in order to find out what was in it, and not all search utilities could do that.

I am also suspicious that there is a serious misdirection occurring in all of this. I think this backup server may contain backups of the mail server as well, not just the individual client machines, which would be a much more complete archive than someone’s personal folders.

It seems likely that the CG1 and CG2 files were sorted and searched directly off server backups, and not by rummaging through everyones’ personal folders.

I think CRU may be hiding the pea in the thimble in the other hand while pointing to the client-in the mail server backups. How do you get them to acknowledge that they exist?

I am a unix admin. If the file was on any drive on the backup server, any competent admin should have found it quickly.

Since they do not want to find the files, if you tell them where to look, they will look nowhere else. Think of the admin as an motivated 5 year who would rather be doing anything else. The directions need to be specific and with no room for error.

I think the real problem is we do not know whose mailbox the original climategate e-mail came from. I think you really need to ask them to check every person the e-mail was sent to. So, try and figure out the all of the people at UEA who originally got the e-mail and send them that that list to check.

As I understand it, this is a Unix box (Linux). This is the backup server though and the path specification is relative to the client machine. Since there are (potentially) multiple backups for each client there may be files that are in one day’s image and not in another. This, I suspect, is the case here. The files were deleted from the client but may still be on the backup server in an earlier image. Given that name mangling is being performed (along with path mapping) the search is also more complicated (though your comment about competent admin applies).

As Steve noted earlier, they don’t want to find the stuff. I think the chance that they will is fairly small.

Regardless of Eudora, if the files were deleted from the file-system they were contained on, there is a chance to recover these files by way of a file recovery utility.

When you delete a file contained on a file-system (NTFS, DOS, ext2/3/4 for example) , the file isn’t actually scrubbed from the file-system. Rather, the file is marked as being “deleted”, and the space which was taken up by that file’s contents is subsequently freed for future use by the operating system, to write data there when a new file is created.

The type of utility I refer to above, scans a file-system for lost or deleted files, and attempts to recover their contents. Depending on how much that file-system was used subsequent to the file deletion, it is possible to recover the contents of the original file. If you are lucky, you can retrieve the file completely.

There are a number of possible locations where these files would have been stored;

a decision was taken to search for attachments in the area of the subdirectory which was relevant to Professor Briffa’s email account.

They giggled when they wrote that. At a minimum, all of Briffa’s document folders should have been searched. It is unlikely when using Eudora that a user leaves everything in the default attachments folder. You move it to the relevant folder with other documents. The game playing is so transparent that one wants to pound the table.

Anyway, they should be instructed to search all of Briffa’s backup directories/folders (or the entire drive) for *ERW*.* “*” being the wildcard assuming Windows, and *AW_Editorial*.*

I have a Case conference in a few hours. I received a letter two days ago from the Univeristy describing their search:

the University searched for files using a case insensitive substring search using the grep commands: Is f grep —i AW Editorial; Is J grep -i AR4SOR BatchAB and Is J grep -i
Ch06 SOD Text. By employing these commands (which in effect amounted to contractions of the actual file names identified in your request) the search would reveal any and all documents with the search terms in the name of the document, regardless of what the ‘suffix’o the search term contained.

Also

You have perhaps understandably raised the question of how it was that the emails to which
documents 5-8 were attached were held on the server (at least at the time of climategate)
but not the documents themselves. The University is unable to say categorically why it is
that the emails were apparently stored on the server but not the attachments. However, it
has tentatively surmised that this may be a product of the way in which Professor Briffa
managed and stored emails versus attachments on his work PC. On this point, you will note
that, as has previously been confirmed by the University, the Eudora system which applied
to the computer system which operated within CRU when the back server was in use, stored
attachments and emails separately. This in itself created a situation in which the backing up
of emails did not automatically correlate with a backing up of relevant attachments. In any
event the reason why the attachments may not be stored on the server is immaterial for
present purposes. What is material is that reasonable searches of the backup server
conducted by the University have indicated that documents 5-8 are not held.

I’m reviewing the file for the meeting. Will check this thread in a couple of hours and thoughts appreciaed.

Those grep commands look garbled, possibly by formatting, but briefly:

Grep searches for particular strings (sometimes expressed as regular expressions) in the contents of a file or set of files. However, it doesn’t search for the target string(s) in the filename of the file.

If the attachments are stored separately from the emails, each attachment being stored as a stand-alone file (presumably with the filename as specified for the attachment in the email), then these grep commands will not neccessarily locate the attachments, unless the attachments contained their own filename contractions within their contents, which they won’t necessarily do.

Also, in addition to the -i flag, which tells grep to search case-insensitively, I would also expect the -r flag to have been used, telling grep to search in sub-directories as well.

(If the backup application creates just a single file (holding all the files that have been backed up), rather than recreating portions of the filesystem tree, then the above comments don’t apply. However, if grep is searching just a single file produced by the backup application, it’s not clear without knowing the format of the file whether the target filenames would appear in the file as plain text for grep to find).

Indeed those unix commands have been lost in translation.
Probably the “Is” should be “ls” (unix command to list files) and the “f” and “J” should be “|”, (the unix ‘pipe’ command that feeds the output of one command to the input of another). So
ls | grep AW_Editorial
would find files whose names contain AW_Editorial, in the current directory only. To search the tree structure you’d say
ls -R | grep AW_Editorial

Or, as people have said above, the more usual way would be to use the unix ‘find’ command.

Paul, I think your deduction of what the actual search commands might have been is likely correct, unlike mine i.e. it looks like they’ve used ‘grep’ to search a directory listing produced by ‘ls’.

As you also point out, it appears that ‘ls’ was run without the -R flag, so only filenames in the current directory would’ve been examined (subdirectories would have been excluded). Unless the operator had navigated to the directory which contained the attachments prior to running the grep, the search wouldn’t have located the attachments even if they’d been backed up. However, according to UEA, they did navigate to the relevant directory (and the pathnames indicate that the attachments were not held in a sudirectory to that).

For completeness, one would want confirmation that Keith Briffa’s PC was the only place where the attachment might be stored e.g. many organisations keep an archive of all emails sent to and from, and sometimes within the organisation, independently of what happens on the users’ PCs. (To give UEA the benefit of the doubt, one would assume that they would’ve mentioned and searched such an archive if it existed.)

OK, so, here is what I am infrering from what they have told Mr. McIntyre:

1) The back-up server seems to be running some UNIX variant (implied by the use of ls and grep).
2) The users’ PCs are running Windows (implied by the use of the Eudora mail client).
3) They only searched their backup for a file with a particular name in a particular directory.

Now, if someone told me that they had a back-up mail server, I would assume it was running the same operating system as the primary mail server and that they are backing up the actual e-mail files, rather than backing up the contents of users’ PCs (such a machine would be referred to as a back-up server but I wouldn’t call it a back-up e-mail server, even if the user backups included e-mail).

On a UNIX server, normally the attachments are stored along with the e-mail itself in a file with a random-looking name. Searching a back-up of such a machine for a particular file name will not find e-mail attachments with those names. Searching a back-up of the user’s PC would but only if they are using Eudora which automatically strips the attachments out. Other e-mail programs such as Thunderbird will store/cache the e-mails in a similar way to the way they are stored on the server, ie, in a database format or with the text and attachments combined in a single file with a non-human-readable filename.

The type of search they have performed only makes sense if they have a Unix back-up server which has backed up the contents of the users’ PCs and the users are running Eudora. In that case, you would have a Unix machine with (some of) the attachments stored with their actual file names, in a single directory, that you could search using an ls/grep command. This is an odd back-up situation; in any other case, those commands would fail to find the attachments even if they were present.

So in summary, this type of search doesn’t make much sense and is unlikely to succeed, based on the information that we have.

I back up the mail server at the company I work for by using the “rsync” program to make a copy of all the mail directories. If someone asked me to find an attachment with a given name in the backup, I would have to search INSIDE the back-up mail files for that text. I imagine most other organisations who have a mail server on a Unix machine and do regular back-ups would be in a similar situation.

I concur here, the search is constructed to search “inside a file”, not a filesystem so I conclude they must be searching an index of sorts. Typically when searching a directory tree i’d use something like

find / -name \*AW_EDITORIAL\*

If I was searching for a file containing content inside a file with a standard unix grep I’d use
find /dir -name \* -exec grep -i AW_EDITORIAL {} \; -print

I think it is necessary to know what they are searching and the format of the index.

Now assuming they are searching index files then i’d actually use something like this which would tell me what index held the file

find / -name \*.index | grep -i AW_EDITORIAL {} \; -print

I think their search has failed at this point, IE the index search hasn’t turned up the filenames, the index probably has the path to the stored files. So what they have done seems right, but they have seemingly only searched one index out of many. The key is that you have requested an “Email attachment” so that implies only files in Eudora email attachment directories where it was not found.

I think you need to broaden the search criteria to any file owned by briffa, that contains XXXXX in the filename where XXXXX is what you want found.

This is probably a total distraction and not particularly helpful, but the constructive thing that UEA should do to reduce the total burden on themselves and reduce the legal trench warfare is to simply ask Wahl for the files – just as Briffa would do if he “misplaced” the files.

I’m coming into this very late and I have no great IT knowledge, but I have to throw this into the fray:

Their argument that the attachments were all dated before August 2, 2009 is not sound. If they didn’t find it on that backup date, they should clearly go back and look at the first backup after each attachment date. Though full backups are done from time to time, many backups are only partials and only include new and re-edited files. The original backup for each should have been done the day of or within the week after each attachment was attached to the emails. If they don’t go back to that backup, they are being incredibly sloppy and lazy with their search. IMHO.

Seems to me that the a more or less bulletproof way to see whether a backup set stored on a server contains a certain file is to use the backup software either to search the backup set with its own search features, or to use the backup software to restore the backup set to its original form on another disk and then use an appropriate search method to locate the specific restored file, by name, contents or both.

The instructions given to the technician are designed to look like they are cooperating while continuing to pull the wool over ones eyes. “its not where we think it should be, therefore it doesn’t exit”.

What crap.

To think that the UEA doesn’t have an archival policy that protects all its email (inc attachments) is not credible in this day and age.

And the problem is, you can ask all the specific questions you like but it will be like trying to shoot ducks at night. You will be wasting your breath.
The IT systems administrator will know EXACTLY where to find the documents.
Ideally an independant technician should be provided access to that server. Should take no more than a couple of hours to hunt said documents down.

If UEA continue to roadblock this, the next FOI request should be to disclose their archive/backup procedures. Those documents will be sitting on a backup drive or tape…somewhere. If they don’t, it also means the UEA does not have the means to recover from the loss of its email server through hardware failure. Not only is that not believable, if true, would open the university up to a major interruption to its operations.

Differences:
* ell-ess not eye-ess
* needs -R to search subfolders
* needs to search the whole filesystem (“/”) or at least a specified filesystem “/mnt/J”, not an arbitrary subfolder of the current location (“J”)
* needs pipe symbol between the two commands
* grep doesn’t find two words the way they showed. It’s a bit trickier than that.
* I provided two examples: the first finds the two words in sequence. The second finds either word

All they have done is demonstrate that they don’t know what they are doing. (Or else a poor front-line admin assistant was tasked with translating a technical description into human language, and understandably made a hash of it.)

egrep -i 'EDITORIAL|AR4SOR|Ch06' <<-EOF
c:\eudora\attach\AW_Editorial_July15.doc.
c:\eudora\attach\AR4SOR_BatchAB_Ch06_ERW_comments.doc.
c:\eudora\attach\Ch06_SOD_Text_TSU_FINAL_2000_12jul06_ERW_suggestions.doc.
c:\eudora\attach\Ch06_SOD_Text_TSU_FINAL_2000_25jul06KRB-FJ-RV_ERW_suggestions.doc.
chapter six (ch06) of this process is somewhat silly
ar4sor is also silly
This editorial brought to you by mister bob
EOF

I am a lawyer of many years experience but hold an honours science degree and I have actually performed similar searches in significant litigation myself.

“However, on the basis that the search terms were sensible, does it make sense that the emails (attested in the Climategate dossier) still exist while the attachments don’t?” Sadly, yes this can happen. I learnt this the hard way many years ago. I lost attachments myself that way. They can be stored on a separate directory.

So unfortunately attachments with emails can be deleted separately from the emails.

However the search seems,prima facie, inadequate, although without seeing the information in detail I can’t say that conclusively. If the emails were backed up through, for example Outlook, they would be in a special Outlook file and would not have been found. I have not used Eudora for many many years so I do not know how old emails may be backed up. The best way I can explain my discomfort is that in one search I performed one important email that only had a few recipients had 31 copies. Copies get inadvertently kept in many ways and in different directories. The organisation may have had automated backing up or in my case at least one secretary was able to read and save emails and backed up her boss’s emails on a regular basis in Outlook .pst files. Secretaries folders should be searched. Unless you know what people could do, searches should not be restricted to particular folders, especially if they may have been forwarded to others.

The other critical comment is I would not place much store in the date. Computer systems automatically update the so-called date and so considerable care needs to be taken when identifying the precise real date of a backup. If these things are put on a backup server, the first question is how the backing up was done etc and therefore where those files might be. eg Was some compression process used on older files or attachments? This would effectively hide them.

Looking for a particular email attachment is perhaps best done by finding a key word or term within it that is reasonably unique and searching the contents of all the files in the whole server back up. This is not hard, you just set it going, go home and get the results in the morning. Care then has to be taken to search all possibly relevant file types.

In short the process is much more complex than people think and their account sounds more like what we in Australia would often derisively refer to the “she’ll be right on the day mate” approach.

First post on your very interesting site. Background, professional engineer CEng specilising in technology in telecoms IT sectors, did a spell in management consultant used to doing due diligence, business development blah, but IT literate to say the least.

You have three very pertinent recent comments which should be put together in my view:

i) PennDragon raises several good points which match my experience. His points about PAs, individuals own policy (ie did they keep copies of attachments in separate directory which is what I do for key files especially ones you want to work with) and again what policy was used in the email system itself. Sorry I dont know Eudora but Msoft mail has default archiving settings but again what was their policy and how was it exactly implemented.

ii) the whole back up policy and tools used need to be understood and how it was actually implemented in the real world so that the correct set of back-up files can be searched. For example, full back, partial back up need to put together for the correct dates after it was sent but before it was deleted. A lot of back-up tools mangle the file names so its no good searching for a file in the back up unless you are searching through the back-up tool itself.

iii) GREP is very powerful tool but as you say is a Linux command. There is hope however, attach the server hard drive as an additional drive in a linux system and boot up using your existing system. You can also use a linux live boot system on a thumb drive and boot from there rather than windows (I understand that is the server OS) and run Grep using the key word in the file approach.

Of course the file may actually have disappeared, then you get into really nasty forensic examination of deleted files that still exist in partial form on hard drives.

The what is wrong with asking the guy to find it himself – he is willing?

Much to do to ensure a thorough systematic search is undertaken before it can be concluded the file has truly been lost.

I wonder if they have actually searched into *all* archives in the backups, including archives inside of archives… Basically they need to take all backups of relevance – put them on a unix box with a suitably large harddrive (given how cheap drives are nowadays they should be able unpack the lot with plenty of room to spare). Then recursively unpack everything in place. A simple recursive Perl script would do this without trouble, should be done in an hour easy.

Then you end up with a set of files that can be examined in detail using unix find and grep.

They should *not* be depending upon the backup utility build in search functionality – it could be buggy or enforce policies that deliberately exclude certain files or file types. Unix find and grep are well proven rock solid ways of searching a file system. Not the quickest but certainly complete.

The CompSci department should be able to do this quite easily; its an afternoons job for a postgrad.. No excuse to doing it properly. In fact there might well be scripts available to do this for you.

I don’t follow your logic on not using the backup utility’s search features. True, it may be buggy, but you are just as much depending on the backup utility not to be buggy in storing or extracting backups. Unix find and grep can’t be any more wonderful than the backup software itself is.

The suitability of a search process is easily testable, if you use the same process to search for something you know you actually do have present in a backup. It shouldn’t be too hard for anyone to document the search process as adequate and reasonable in this way.

One Trackback

[…] server for the Wahl Attachments, reporting on Sep 28 that the search had been unsuccessful. (See here for most recent previous status report). They refused to provide some requested crosschecking […]