Post navigation

63 thoughts on “Guccifer 2.0 NGP/VAN Metadata Analysis”

Comment navigation

Your “method” is basically a variant of those grade-school “guess a number” fake magic tricks. The result is 23 MB/s, simply because 23 MB/s is the speed you used earlier, to “approximate the transfer time”. Y/N?

Your “method” is basically a variant of those grade-school “guess a number” fake magic tricks.

The method used is an iterative approximation which is a standard technique. Please see my previous reply.

There is a series of files and directories that have no time gaps: it includes some top-level files and the FEC directory. The total size is 869 MB, which is 40% of the total. Using only the earliest last mod time and the latest in that series of files, the total elapsed time is 31 seconds. The transfer rate for those files works out to 28 MB/s. On that basis alone, we can be fairly confident that 23 Mb/s is in the right ballpark.

The transfer speed calculation is not very sensitive to the estimate of average transfer speed. If we guess 2 MB/s, the result is 20.5 MB/s. If we guess 200 Mb/s, the result is 27.3 MB/s.

1. Take the difference between one file’s timestamp, and the next. […] 8. Conclude something about transfer speeds from the result. The result is whatever “average transfer speed” you used in step 2.

Not quite. As explained in the analysis: “We can estimate the transfer speed of the copy by dividing the total number of bytes transferred by the transfer time. The transfer time is approximated by subtracting the time gap total from the total elapsed time of the copy session.” and “We further calculate the “time gap” which is the difference between the last mod. time of a current entry and its previous entry; from this we subtract an approximation of the transfer time (using our knowledge of average transfer speed) to go from the last mod time to a likely time that the transfer started. We use a cut off of at least 3 seconds to filter out anomalies due to normal network/OS variations. Here are the entries with significant time gaps.”

The key here is that the overall transfer time is calculated across the entire elapsed time of the copy, subtracting out the sum of the time gaps. As shown in one of the figures, there are only 9 time gaps, which all occur at the top-level. The difference between the earliest last mod time and the latest is 14:15 (14 mins, 15 secs) and the sum of the time gaps is 12:48 (12 mins, 48 secs). The difference is 87 seconds – that’s the transfer time. If we divide the total bytes transferred (1976 MB) by the transfer time of 87 seconds – the resulting transfer speed is 22.6 MB/s.

It would be a mistake to look only at file-by-file differences in last mod times because that wouldn’t account for the time gaps, and the time gaps are a significant percentage of the total elapsed time (90%). Also, there can be some variability in recorded last mod times — other OS activities will add variability. Thus, it make sense to work on the overall time in aggregate; this averages out the effect of “noise”.

When calculating “time gaps” a correction factor is subtracted from the last mod time to approximate the “first write time”. This is important for large files. Consider the following.

File Last Mod Size
A 6:45:00.000 10,000
B 6:45:10.000 226,000,000

If we just took the difference between B’s last mod time and A’s last mod time, we’d get 10 seconds. If we view that 10 seconds as a “gap” it will lead to an incorrect estimate for the time gap between when A was last written and B was first written (the “gap”). But if we subtract (226,000,000/22,600,000) or 10 seconds from the last mod time of B we arrive at a time gap of zero (0) rather than ten (10) seconds. Since we don’t know the overall transfer rate until we calculate the time gaps as above, the process is iterative. I did the iteration by hand, but Excel’s “Goal” seek could have been used to good advantage.

I posted this to the “bytes vs. bits blog” entry, but its content is probably germane to the main thread here, also.

I think that all the feedback and responses above are helpful in understanding counter-claims and responses to them, but the overall length of those comments is becoming unwieldy. I will look into how to provide a focused summary of the issues and my responses. Perhaps some of that info will be incorporated into the main analysis.

Thank you for the feedback. As you point out, the 80% figure is incorrect. That is unfortunately a misplaced comment that was meant to apply to the next section on peripherals (USB-2). Even then, that theoretical number is in fact closer to 65% (not 80% – I eyeballed the percentage – didn’t take into account that the lower scale is log). I will make the corrections.

Although those detailed figures in this short write up were incorrect (and will be fixed), they won’t change the overall conclusions in the analysis. In practice, you’ll find that a rate somewhere between 20Mb/s and 25Mb/s is a typical speed when writing to a USB-2 flash drive.(As mentioned in the write up, file by file copy operations will slow things down to well below the theoretical speed.)

Although many have pointed out that their Internet provider or their company’s fiber link may provide theoretical speeds that perhaps exceed 23 MB/s, we need to put this rate into perspective. Guccifer 2 claims he is a Romanian; some have claimed Russian; some have claimed neither, or even that Guccifer 2 may in fact be several people. Putting that controversy aside, ThreatConnect determined that Guccifer 2 likely used a commercial VPN service originating in France. If we accept the theory that Guccifer 2 is working out of Eastern Europe (or Russia), using a commercial VPN service as a relay to Washington, DC then I think it is fair to claim that the rate achieved will be nowhere close to 23 MB/s.

The key point of the 23 MB/s rate is that it provides support for the conclusion that a local copy was made; that rate happens to also be consistent with a local copy to a USB-2 flash drive. Combine this with the observation that the copy was likely done on the East Coast and that ‘cp’ (inherently a local copy operation) probably was used, would produce the observed last modified time pattern. Those related observations lead directly to the conclusion that the initial copy operation was likely a local copy.

Other observations strongly argue against Guccifer 2’s claim that he hacked the DNC — the analysis noted that a second copy operation was done on Nov 1, 2016 which built the precursors of the final 7zip. Key conclusions: (1) this second copy operation was also likely done on the East Coast and (2) those precursors (the regular files and .rar file present in the 7zip file) were likely copied to a thumb drive. It would be difficult for a hacker in Eastern Europe (or Russia) to arrange for a thumb drive to be plugged into a system on the East Coast, and we would have to ask how is this consistent with Guccifer 2’s claim that he hacked into the DNC?

well, now i am realizing that this analysis does not pertain to the DNC emails, but to the DNC documents. so, the stuff about the most recent email and the creation of the zip file is irrelevant, sorry.

“well, now i am realizing that this analysis does not pertain to the DNC emails, but to the DNC documents. so, the stuff about the most recent email and the creation of the zip file is irrelevant, sorry.”

Thanks for the clarification. There are a lot of moving parts (and facts) surrounding the DNC, Crowdstrike, Guccifer 2.0 and so on, which make it difficult to keep track of the interdependencies.

crowdstrike apparently asserts that they have evidence of exfiltration of data from the allegedly compromised DNC servers. if they had evidence of exfiltration consistent with the dates, times and file sizes reflected in the zip file, to me, that would be pretty persuasive evidence, assuming that the evidence of exfiltration is not fabricated or orchestrated with the zip file to stage a crime scene.

i was referring to the gap between the most recent email in the leaked emails, which is from May 2016, and the apparent earliest date reflected in the creation of the zip file. one possible explanation for this gap is that the means by which the DNC systems were made remotely accessible were discovered by the IT staff, and remediated. IIRC, the DNC announced in early June that it had been hacked. I don’t think you are making this argument, but many people have cited your analysis to buttress a theory that, rather than a hacking, the release of the DNC emails was the result of what amounts to a leak, so, in this theory, someone who legitimately had access to the systems in question made copies of files and later they wound up in the hands of wikileaks. the gap between the date of the newest email and the earliest date reflected in the zip creation seems to be inconsistent with this theory, and more consistent with the theory of remote compromise.

as far as I can tell, there could have been any number of intermediate steps between the obtaining of the emails and the creation of the zip file. The emails could have been obtained singly over any amount of time, and then the zip file and its constituents created at any other location, at any other time, between the most recent email and the release of the zip. I believe the latest email in the DNC release dates from May 2016. This zip file and its constituents seems to date from early July. If it was an inside job, what theory accounts for the lag between the most recent email and the creation of the zip file?

“as far as I can tell, there could have been any number of intermediate steps between the obtaining of the emails and the creation of the zip file”

This study analyzes only the metadata in a 7zip file attributed to Guccifer 2. There were no emails in that disclosure, AFAIK.

The “local copy” conclusion in this study is predicated on two observations: (1) the original copy operation was likely done on the East Coast and (2) transfer speeds of 23 MBytes/sec. are consistent with a local copy.

Some reviewers have suggested a “pivot” where the files were first locally copied (onto a server at the DNC) and then transferred upstream from there. Some reviewers have stated that perhaps the “hacker” set the time zone on his computer to East Coast time. Some have suggested that the “pivot” occurred at a nearby site;suggesting without support that 23 MB/s could be sustained from the DNC HQ to this local “safe site”. Some have even suggested that this local “safe site” was in fact located in one of the nearby compounds used by the Russian embassy mission before they were kicked out.

On the theories of using a local “safe site” or setting the time zone to East Coast time, this is inconsistent with Guccifer 2’s previous behavior: (1) he worked quite hard to convince the public that he is an Eastern European (Romanian) hacker that some have suggested is in fact a Russian proxy, (2) Threat Connect concluded that Guccifer 2 used a a VPN service (Elite VPN) host located in France (Adam Carter covers this in detail at http://g-2.space/#3). Both actions of setting the time zone and/or using a local site closer to the DNC are inconsistent with Guccifer 2’s efforts to appear to be an Eastern European hacker and his prior use of a VPN located in France.

On the theory of a “local pivot”, where the hacker first copied (using a program like ‘cp’) to a directory on the (presumed) DNC server – yes that scenario would be consistent with the observed facts. What has not been answered is the question: why did the hacker make this unnecessary local copy? Why risk creating a 2G to 20G “footprint” (even if it is only temporary)? In my experience, hackers try to ex-filtrate data quickly and quietly and would therefore avoid making this “local pivot” copy. On this basis, I rank this theory as less likely than the conclusion reached: the initial copy operation was a local copy. I’d go so far as saying that this “local pivot” theory is unlikely.

“This zip file and its constituents seems to date from early July. If it was an inside job, what theory accounts for the lag between the most recent email and the creation of the zip file?”

The second copy operation has been under-reported. The study determined that the precursors of the final 7zip file were likely [on Nov 1, 2016] (1) copied to a thumb drive and (2) copied at a location on the East Coast. No reviewers have offered compelling theories which might provide alternative explanations. Certainly, setting the time zone to Eastern might come up. Note: the presence of a thumb drive puts a real live person on site. Also, note that this second presumed East Coast site need not be the original site.

I don’t have a theory on the two month lag between July 5, 2016 and Nov. 1, 2016 (with ultimate release on Nov. 13, 2016). I also don’t have a theory or rationale for the content in that NGP-VAN 7zip file; although there is 2 GB of data there, it is somewhat of a hodge podge and some reviewers question its overall usefulness or relevance as a “leak”. As mentioned in the study, there may be another 18 GB that was held back. One speculative theory is that the purpose of the disclosure was to tell the owner of the original data, “We have your data. Act accordingly.”.

I am Systems Design/Integration expert for large Defense contractor. Common sense only need be applied in addition to supporting data provided by the Forensicator. First, the DNC wasn’t hosted by idiots, it was done by pros with multiple layers of security that would be extremely difficult to penetrate without leaving very obvious tracks. We are not talking about a simple file share here or Sharepoint, we are talking mail storage of multiple users. In our perimeter we are attacked by automation over 10000 times daily from sources in this order of frequency: China, Ukraine, Iran, Russia, UNITED STATES and Saudi Arabia. All of them unsuccessful. If data loss occurs within this infrastructure, it is the result of insider cooperation. Mail storage level? Please…. total insider.

Next, examine tbe actual dates of all messages and find the most recent. You will likely find that all the messages have a most recent date consistent with the others, maybe off by a week at most. This supports the theory that the data set was done in large chunks or one lump depending on the thought process. Hell, I would even play along with remote access, but not by hack. Whoever did this had access.

Lastly, if this was criminal and you wanted to find the “hacker”, you are going to give the Feds full access to the data, not a report created by CrowdStrike with logs that can be manipulated and logs that most certainly display attempts by foreign hackers like every other perinter security device would capture. If you are the feds and you are going to publicly state the Russians attacked us AND take action based on the data theft, you are going to have full access for forensics. Of course if you are trying to sell the notion of a serious offense to the public you might not. Especially if you have done it before when you said that Benghazi was a response to a YT video.

Let’s just use Occum’s razor. Is it likely that a Russian password guesser breached perimeter security, gained internal LAN access and emulated Admin level control to mail storage which was used to copy GBs of data through a firewall that would surely flag or even potentially interrupt the transfer?

OR… Someone with local access to the Internal LAN or physical device simply bulk copied the data to media for easy removal from the site with little fingerprints left to track.

The truth is that IT Security is only as good as the integrity of your clients. We can make unwanted access a nearly impossible event, but it only takes one internal client with an axe to grind or dollar to make for an entire organization to fall.

In the end, the data was honest. Why isn’t the behavior of the Democrat party more the issue? It is like the husband who has the affair is blaming the wife for seeing text messages to his mistress rather than the fact thar he sent them in the first place.

Have not read all or your article, but I would not use “touch” on any of the files,as you are modifiying the timestamp( I know you know that…) ! That is you are corupting your own data. If the goal is to adjust to a different timezone then change your TZ env var. If you are using cygwin, it has a package with all the timezone datafiles in the world, it takes a while to install, but I always installed it. cygwin is great, but thankfully I have not needed to use it since I live in fedora or centos now.

The problem is that you have two sets of files: (1) those that come from the .7zip file and those contained in the .rar files. For example, CIR.zip is in the 7zip file and has a last mod date of 7/5/2106 3:52:00 PM (when opened in the Pacific Time Zone). This needs to be advanced by 3 hours to agree with the times in the .rar files (which show local time).

Generally, it is not a good idea to change metadata when reviewing evidence, but in this case you have files with two differing time representations (UTC for the 7zip, local for the .rar files). We need to adjust the .7zip files to bring them into agreement, as they would appear when the files were originally copied.

If that doesn’t quite make sense, consider that if the .7zip file were opened on the East Coast that the last mod date/times would fall into the same range as those shown in the .rar files. That’s because the 7zip GUI will adjust the UTC times to the equivalent time in the time zone in force when you open the 7zip file. As an aside, most Windows based programs don’t act on the TZ setting.

“I’d encourage you to post a link to this comment area at the top of the main post (perhaps in that Acknowledgements area) so that readers can easily see dissenting viewpoints.”

Good idea. It make take a day/two, I’ll update the article per your suggestion.

“1) I would suggest removing the leading commentary from your conclusions. […]”

Thanks for the suggestion, but document will stay as is unless major technical issues are found or clarifications are needed. After each conclusion, there is a statement regarding the basis for the conclusion. Hopefully, that helps.

“2) […] 23MB/s is not anywhere near out of the realm of possibility for remote file transfers, especially not for large organizations or government agencies. […]”

The analysis report notes that if you take the last mod time stamps of all the constituent files (after unpacking the top .rar files) they’re all compressed into a 14 minute period with significant gaps amounting to 13 minutes. The analysis follows the theory that the 13 minutes of gap represent files that were copied, but left out of the final .7zip file. If we look at only the files copied and use their total size (in bytes) divided by (elapsed – total gaps), we get a transfer time of 23 Mbytes/sec. Because of the pattern of the last mod dates we conclude that a command like Unix’s ‘cp’ was used.

Some people have suggested that the first copy operation might been out to a location close to the DNC and then those files were copied from there. Let’s call that location BASE1.

Let’s first note that the usual use of ‘cp’ is a local copy operation. There is another form of ‘cp’ called ‘scp’ (secure copy); it works pretty much like ‘cp’ but can go remote. It will require some setup on BASE1, but it would be the natural way create the ‘cp’ pattern of last mod times while copying over the net. it might look like this:

scp -q -r ‘NGP-VAN’ BASE1:

(above: “-q” for ‘quiet’, “-r” for ‘recursive’)

In practice, ‘BASE1’ might be the IP address of our clandestine server.

Nothing wrong with that — fits the facts: a last mod pattern with the appearance of file-by-file copy. You’ll get a tail wind, because by default ‘scp’ will encrypt the data and most compression algorithms compress the data before encrypting it. The rationale is that encryption can be cpu-intensive, slow operation; performing it on less data might make things go faster as long as your compression algorithm runs faster than your encryption algorithm. You get the extra benefit that the content of your packets will be difficult to sniff on the wire.

Now the only thing you’ll need to do is to time it. What you’ll find is file-by-file copy will slow things down a lot. How much is a lot? Some testing is needed, but 3x to 10x worse is possible. File-by-file copy introduces file and directory creation overheads that have nothing to do with communication transfer speeds, though the back-and-forth handshakes for each file do introduce overhead.

The bottom line is that just because you have a fast link, you may not come close to hitting its peak transfer rates because there are other overheads involved. If I have some time, I will try and back up those claims. Otherwise, I encourage you to try a few experiments. Ideally using the actual NGP VAN 7zip data. Try it on your local net.

I’d encourage you to post a link to this comment area at the top of the main post (perhaps in that Acknowledgements area) so that readers can easily see dissenting viewpoints. As for my feedback on your analysis:

1) I would suggest removing the leading commentary from your conclusions. Statements like “The data was likely initially copied to a computer running Linux” are misleading – that is only one possibility and in my professional opinion, not the most likely. I’d suggest changing statements like this to “The data may have initially been copied…”

2) I know you somewhat address this is a follow-up post, but 23MB/s is not anywhere near out of the realm of possibility for remote file transfers, especially not for large organizations or government agencies. My clients with international connections easily reach these speeds, so they are not *necessarily* indicative of a local transfer.

3) You mention that the files were copied individually, not as a single large package. This can actually help speed up remote transfers, as multiple files can be sent synchronously, bypassing a lot of the bottlenecking you can experience in international peering.

4) All of the above is somewhat of a non-issue in my experience. It would actually be relatively uncommon for individual files to be exfiltrated in this manner. *Far* more common would be for them to be collected on a local machine under remote control, packaged nicely, then exfiltrated as a single package. Depending on the level of security, this can be accomplished in a single big transfer, or the package can be fragmented to speed up the transfer.

5) If the files were collected locally before being extracted, this would easily explain the EDT times, the FAT timestamps, and the NTFS timestamps. None of this indicates one way or the other whether the attacker was local or remote. It is impossible to tell from any of this evidence, and suggesting otherwise is disingenuous.

6) The conclusion that this also involved a USB drive and a Linux OS is also likely flawed. As you point out, ‘cp -r’ is an easy explanation, but booting to Linux is not the only way to accomplish this type of transfer. Many remote access tools use ‘cp’ and ‘scp’ as the base for their file copy tasks. This would leave the timestamps in exactly the format you describe. In my experience, it is *very* common to see this sort of timestamp in a breach investigation.

7) The scenario you envision, frankly, is overly complex and unlikely. It is, in my opinion, far more likely that a remote attacker utilized a single breached DNC machine to locate and collect the desired data, did so using their attack tool (rather than RDP and drag+drop), and packaged it all for exfiltration on that machine. This would be supported by all of the evidence you describe and matches the most common breach scenarios we’ve seen over and over again.

Overall, I think your investigation of the data is good. You pull out some interesting information and were thorough in your research. However, your analysis seems tainted by the intent to draw specific conclusions from this data. Looked at objectively, the most likely scenario supported by your data is not the one you propose. This article could be rewritten to be very informative without the obvious slant and doing so could make it a valuable resource for those interested in the information. As it stands now, however, the bias in your conclusions makes the analysis difficult to take at face value, because the reader is left having to separate technical evidence from personal bias.

I hope you’ll consider re-writing (or at least amending where the evidence supports other potentially more likely possibilities) because this is certainly research that is worth a read. If you can separate your personal feelings from the technical analysis and conclusions, this would be worth submitting to a journal for peer review, rather than leaving it sitting on an anonymous blog.

I hope this feedback was useful, if for no other reason than to present a different viewpoint.

“3) You mention that the files were copied individually, not as a single large package. This can actually help speed up remote transfers, as multiple files can be sent synchronously, bypassing a lot of the bottlenecking you can experience in international peering.”

The saying goes: “In theory, the difference between theory and practice is small. In practice, the difference between theory and practice is large.”

The problem is that ‘cp’ and its close cousin ‘scp’ are simple, non-threaded programs. They are *not* Robocopy or FileZilla and if they were they would preserve the last mod times.

I encourage you to run a few experiments and get back to us with both positive and negative results.

“4) All of the above is somewhat of a non-issue in my experience. It would actually be relatively uncommon for individual files to be exfiltrated in this manner. *Far* more common would be for them to be collected on a local machine under remote control, packaged nicely, then exfiltrated as a single package. Depending on the level of security, this can be accomplished in a single big transfer, or the package can be fragmented to speed up the transfer. ”

Far more common, in my experience is for them to be copied over the wire and not deposited in a local directory first. A local directory leaves a foot print. A 20G directory leaves a *big* footprint. That, and it is an unneeded extra step to make a local copy of the data.

Something like this on Unix:

$ tar czf – file://server//NGP-VAN | ssh BASE1 tar xfz –

There’s a lot of ways to do that; the command is intended as an example that the files can be streamed over the ‘net without the need to make a local copy of NGP-VAN.

Below, is something like what you’re describing.

$ cp -r file://server/NGP-VAN .
$ zip NGP-VAN
$ rm -rf NGP-VAN

and then transfer NGP-VAN.zip back to Romania. This will produce NGP-VAN’s last mod pattern created by ‘cp’ when the zip file is ultimately unpacked. Again, why did you make a local copy?

“5) If the files were collected locally before being extracted, this would easily explain the EDT times, the FAT timestamps, and the NTFS timestamps. None of this indicates one way or the other whether the attacker was local or remote. It is impossible to tell from any of this evidence, and suggesting otherwise is disingenuous.”

Before I answer, please clarify/restate: “If the files were collected locally before being extracted, this would easily explain the EDT times, the FAT timestamps, and the NTFS timestamps.” Outline your proposed scenario in enough detail that we can follow it, and comment on it. Explain how that scenario supports your claims.

The analysis doesn’t say: “With 100% certainty the attacker was not “remote”. It says that the fact pattern indicates a local copy was made and the file times in that local copy showed the pattern of using ‘cp’, which is primarily used for local copying operations. It further states that the effective transfer rate of 23 MB/s is too fast to support the idea of file-by-file copying back out over the Internet (although that would an unusual way to use ‘cp’, but allows for the use of ‘scp’).

Readers can decide, or opine, on whether they think it makes sense that a hacker would first make a local copy of the files before shipping them offsite, which creates a big intermediate directory and will add more time to the overall operation. *That* does, IMO, seem to me like an extra step added to fit the facts.

A big hurdle that anyone claiming Guccifer 2 hacked the DNC (either in the way he claimed or otherwise) has to explain why neither the DNC, the FBI, nor Crowdstrike, nor NGP-VAN supports the claim that Guccifer 2 hacked the DNC. In fact the DNC hasn’t acknowledged that the files on the disclosed NGP-VAN .7z file are DNC’s files. That is one pretty strong reason to come into the analysis with a “not a hack” bias.

“6) The conclusion that this also involved a USB drive and a Linux OS is also likely flawed. As you point out, ‘cp -r’ is an easy explanation, but booting to Linux is not the only way to accomplish this type of transfer. Many remote access tools use ‘cp’ and ‘scp’ as the base for their file copy tasks. This would leave the timestamps in exactly the format you describe. In my experience, it is *very* common to see this sort of timestamp in a breach investigation. ”

On this point, “Many remote access tools use ‘cp’ and ‘scp’ as the base for their file copy tasks.” If the host runs Linux/UNIX, I can accept that statement, because UNIX has those commands already installed. I can’t see why they’d bother shipping in ‘cp’ because Windows has “COPY” already. ‘scp’ maybe, but I’d like to hear that you/others have either seen this in practice or see a document that supports that statement.

When you say ” it is *very* common to see this sort of timestamp in a breach investigation. “. Was that a breach of a Windows based system? Did you also see the hackers making a large local copy of a (20G) directory before shipping it out?

“7) The scenario you envision, frankly, is overly complex and unlikely. It is, in my opinion, far more likely that a remote attacker utilized a single breached DNC machine to locate and collect the desired data, did so using their attack tool (rather than RDP and drag+drop), and packaged it all for exfiltration on that machine. This would be supported by all of the evidence you describe and matches the most common breach scenarios we’ve seen over and over again.”

Complex (and simple) are always in the eye of the beholder. Rather than debating the vague quality of complexity, let’s clearly state our cases and let others decide on which of the two interpretations of the facts matches up with their experience and their sense of what makes sense to them.

Here is what I see as a simple scenario, in the paragraphs below.

First, we assume that this was not a hack. We come in with that bias because no one who should know is saying G2 hacked DNC. Maybe they have their reasons (ongoing investigation, etc).

Our bias won’t matter anyway, if the facts don’t support it.

We note that fast transfer times support the idea of a local copy. We discard the idea of making a temp copy locally, because it seems unnecessary (more complex) and in my experience hackers work hard *not* to leave big footprints. 20G (or even 2G) is a big footprint.

We note last mod time patterns that are consistent with the use of the ‘cp’ command, which is a Unix command. Linux is Unix. Bootable Linux drive images are widely available; they are easily burned to a USB drive. They are commonly used by IT admins, pen testers, forensics types, and hackers (said it).

So, we think: let’s look at “boot Linux from a USB drive”. Is that simple? Before answering let’s decide if phishing, hacking a firewall, escalating privileges sufficient to access someone’s Documents directory, or some network file share is “simple”? I’ll say “no”.

To me the idea of an insider going to an employee’s desktop PC, on the day after a 3 day July 4 weekend, after hours, booting a Linux USB drive and then taking 15 minutes to copy off a big directory/two is simple. No hack, no authentication, no logs. Alternatively, you might access a network share. For that though, you’ll probably need authentication. As an insider you can side step that, esp if you have some sort of network admin privileges. Maybe you’ll leave some log entries behind you, but with a 60 day retention policy there won’t be any when you release the docs 2.25 months later.

In passing, did it occur to anyone at the DNC, that they should download the NGP-VAN 7zip file produced by Guccifer 2 and take a look? Those 7/5/2016 dates are pretty obvious. Would that prompt them to check their logs? Would it prompt them to track down the locations where the data in that .7zip file can be found?

Note: I’m not saying this is what happened, just that the facts both support the scenario and don’t negate it. If we saw a 2 MB/s transfer rate, I would back off the idea of local copy. If we saw a 200 MB/s transfer rate, I’d say that there is something wrong with the metadata.

“I hope this feedback was useful, if for no other reason than to present a different viewpoint.”

Yes, thanks for taking the time to provide detailed counter-points and for encouraging discussion.

“To me the idea of an insider going to an employee’s desktop PC, on the day after a 3 day July 4 weekend, after hours, booting a Linux USB drive and then taking 15 minutes to copy off a big directory/two is simple.”

Let me clarify here, before someone starts warming up the phasors. Only Guccifer 2 has stated that the files disclosed in the NGP-VAN 7zip are from the DNC and were somehow obtained as the result of exploiting vulnerabilities in NGP/VAN or the DNC firewall. Please read the statement above as hypothetical. We don’t know; it might be from some DNC ex-employee’s backup drive inadvertently left on the counter of the Starbucks across the street from the DNC. We don’t even know whether the data can be authenticated as coming from the DNC.

The hypothetical above is based on the same premise as the “remote hack” theory — someone named Guccifer 2 collected DNC data, presumably from behind the DNC firewall, and this data was later disclosed on Sept 13, 2016.

“To me the idea of an insider going to an employee’s desktop PC, on the day after a 3 day July 4 weekend, after hours, booting a Linux USB drive and then taking 15 minutes to copy off a big directory/two is simple. No hack, no authentication, no logs. Alternatively, you might access a network share. For that though, you’ll probably need authentication. As an insider you can side step that, esp if you have some sort of network admin privileges.”

The act itself sounds simple but fitting it into context could generate a great deal more complexity. How many DNC employees would you say there were who could have conceivably accomplished this? What are some likely motives for carrying out this act and are any of them consistent with the data?

I’ve seen many people suggest some manner of whistleblower scenario, usually relating to favoritism the DNC showed to Hillary Clinton over Bernie Sanders. That a DNC employee would randomly decide to steal a great deal of data on the off chance of finding something incriminating that they could leak just for the sake of becoming a whistleblower sounds to me like a rather far fetched scenario . Our hypothetical whistleblower would more likely be privy to DNC malfeasance prior to accessing the server and then later downloaded the data in order to obtain evidence of that malfeasance along with other useful information that may have been in the same directory. This in turn narrows down the suspect pool. We need a person who might have become privy to the emails concerning undermining the Sanders campaign who was not sympathetic to this end and had the kind of access needed to steal from the server.

I can think of two other scenarios. Someone actively infiltrating the DNC by becoming an employee for the purpose of stealing information on behalf of some third party or a previously loyal DNC employee flipped by an outside motivation. Both are complicated and push the boundaries of plausibility especially since there is no evidence to support either.

Then there’s further complexity generated by fitting it into the greater context of the investigation, as well as national and global politics.

If there was no hack, would this not imply that Crowdstrike was lying to the FBI? What would compel a cybersecurity firm to commit such a felony? How does one even going about contracting a firm for that purpose?

This is to say nothing of the backdrop of Russian funded electioneering campaigns and other documented hacking attempts of Government and Near Government organizations.

You go on to suggest that the data may not have even come from the DNC which adds further layers of complication. You said: “We don’t know; it might be from some DNC ex-employee’s backup drive inadvertently left on the counter of the Starbucks across the street from the DNC. We don’t even know whether the data can be authenticated as coming from the DNC.” I assume this particular scenario was in jest but it raises the question what’s a plausible alternative scenario?

So what at first appears physically and technically simple in context becomes logistically complex.

You make some good points and thanks for your reasoned response. The purpose of the study is to analyze the available metadata, make observations, and to some degree speculate based on the observations. The point of the speculations is to illustrate whether the analysis supports or disputes the claim that Guccifer 2 hacked the DNC and then published the “NGP VAN” 7zip file. In that context, we have to assume (1) the 7zip file represents data derived from a DNC source, (2) there may have been a hack or leak.

We make those assumptions because we are trying to test Guccifer 2’s claim that he hacked the DNC, then obtained the “NGP VAN” data and later disclosed it. So yes, the suggestion that this data might have been derived from an ex-employee’s backup thumb drive was partly in jest, but also to remind us all that we don’t know the actual source of the data. For the purpose of this study we need to follow Guccifer 2’s claims, because we are testing the veracity of those claims.

The study doesn’t try to speculate on whether there was a whistle blower, an insider, or even an agent of some state government. It simply disputes the scenario claimed by Guccifer 2 that there was a hack initiated from Eastern Europe or Russia. The point of describing a scenario involving a boot to USB with Linux was mainly to illustrate a feasible and reasonable scenario that fits the facts.

“If there was no hack, would this not imply that Crowdstrike was lying to the FBI?”

As far as Crowdstrike goes, from what I recall Crowdstrike stated that they found indicia of malware which they attribute to two alleged Russian sponsored hacking groups (COZY BEAR and FANCY BEAR). They were not able to determine if any information had been ex-filtrated. IIRC, Crowdstrike never made any claims re: Guccifer 2 and therefore did not link Guccifer 2 to the alleged Russian hacks of the DNC. If you/others have information to the contrary, please post a reply.

Thus, based on the public record, there is no information that shows that Crowdstrike lied to anyone. Their findings were sufficiently limited to make people question whether Crowdstrike’s findings fully support the conclusions that Russian sponsored groups hacked the DNC, later leaked DNC documents and emails to Wikileaks — all in an effort to influence the election in favor of a Trump election victory.

Let me play devil’s advocate for a moment. How do we know that the 9-1-2016 6:45PM copy was when the files were copied off the server? The files could have been extricated some time prior and copied within the attacker’s system using the cp command at that moment. This may have been only the last of several cp copies. And these hacking groups have been know to adopt sleep schedules to match their target’s timezone. It’s not inconceivable that hackers in Russia would have their computers set to US Eastern time.

Where we need to go from here is to examine the system logs on the server and look at the shutdown and startup times. If we find that the Windows server (I assume it was Windows, they’re Democrats) was shutdown just before the start of the copy and came back online shortly after it finished (if they show an unusually long reboot of highly coincidental timing) then we can be very confident that it was an inside job. This would also require that the server has a USB3 port to connect a suitably fast flash drive. But if the logs show that the server was running smoothly right through that time period then it would not contradict the Russian-hacker theory. A Linux server running smoothly at the time could support either theory.

Crowdstrike presumably still has the harddrive images. And they claim to have sent copies to the FBI. Either could quickly check the logs and settle the question in five minutes.

Well, Crowdstrike and the FBI have already examined all of this and released their findings months ago. We are unlikely to hear more from them barring any publicly disclosed information resulting from the Mueller investigation. I wouldn’t hold your breath for that. Also, it is doubtful that CrowdStrike still has the images. The FBI certainly does, but once the investigation was concluded, CrowdStrike was likely required to destroy them (standard practice).

“Well, Crowdstrike and the FBI have already examined all of this and released their findings months ago.”

Per Comey’s testimony, as I understand it, he said that the DNC denied access to their servers, even after being asked repeatedly (“at multiple times and several levels”) by the FBI. Comey also stated that they (the FBI) depended upon Crowdstrike for analysis of the servers and the (alleged) hacks. Crowdstrike declined an invitation to appear at a Congressional hearing subsequent to Comey’s testimony. If you have a different understanding please follow up, ideally with cites.

“Also, it is doubtful that CrowdStrike still has the images.”

Update: Previously, I said: “I have not seen/heard any statements/testimony by Crowdstrike that they made images […]”. Recently, Alec Dacyczyn followed up with a cite to a July 5, 2017 WT article,http://www.washingtontimes.com/news/2017/jul/5/dnc-email-server-most-wanted-evidence-for-russia-i/
which states,
“In May 2016 CrowdStrike was brought to investigate the DNC network for signs of compromise, and under their direction we fully cooperated with every U.S. government request,” a spokesman wrote. The cooperation included the “providing of the forensic images of the DNC systems to the FBI, along with our investigation report and findings. Those agencies reviewed and subsequently independently validated our analysis.”

My main questions would run along the lines: (1) which systems were imaged, (2) were they full images, or excerpts (such as providing only the artifacts that CS found of interest), (3) when were they made, would the time interval include both claimed Russian hacks, (4) who were the other agencies?

This news that images were provided to the FBI and perhaps other agencies is coming late in the game, and was somehow omitted by Comey and others in testimony. Further, an email to the WT is a somewhat surprising (and weak) method of making this information public.

If they have, I haven’t seen/heard where the FBI said that. Please share a cite, if you have it.

“but once the investigation was concluded, CrowdStrike was likely required to destroy them (standard practice).”

In my experience, the DNC might have the court/tribunal direct the FBI to destroy their copies (if they have them) after the trial/investigation has been concluded but not before. For Crowdstrike, it is the DNC’s decision — it is the DNC’s data. Good practice might be to hang onto it for 3/so years, just in case something else comes up. At their choosing, they could decide to do something like destroy all laptop images, or retain only logs, hacking artifacts and so on – their choice.

It isn’t clear that the FBI performed their own independent investigation. Instead, the FBI decided to “check in” with Crowdstrike and then decided that no further action was needed, or so it seems.

The first copy was on 7-5-2016 at 6:45 PM. The second copy was on 9-1-2016.

A way to look at this report is that it asks the question does the available data support the scenarios/conclusions claimed? It is not that other scenarios aren’t possible, and readers are welcome to state their opposing theories here. I may not challenge them point-for-point though, because we would just be arguing one speculation against the other and one person’s experience against the other. Ultimately, the readers/reviewers can decide for themselves whether the conclusions in this report seem plausible.

“Where we need to go from here is to examine the system logs on the server and look at the shutdown and startup times. If we find that the Windows server (I assume it was Windows, they’re Democrats) was shutdown just before the start of the copy and came back online ”

To date, to the best of my knowledge (correct me if I am wrong): Neither the DNC, the FBI, nor any other source that might be in a position to know have acknowledged Guccifer 2, a hack that might be attributed to Guccifer 2, nor have they confirmed/denied that the data/docs released by Guccifer 2 originated in the DNC or a related organization. The NGP/VAN company denies that the “0 day” vulnerability claimed by Guccifer 2 exists.

On the face of it, only Guccifer 2 claims that he successfully hacked the DNC.

If you refer to the material in http://g-2.space and elsewhere you will see reports that Crowdstrike was on site as early as late April, 2016 per CS’s own reports they “mitigated” the alleged hack(s) by re-installing software on all systems inclusive of each individual’s laptops. CS does not say if they made image copies of hard drives, preserved logs or backups, and so on. If such actions were *not* taken, no one will be able to access the relevant logs and other relevant files now.

When you say “on the server”, you seem to be suggesting that this analysis presupposes that a server might have been rebooted and a USB drive plugged into the server? That may be the case, but taking a server offline is fairly disruptive and might be noticed. Besides, these days services are often run on VM’s on a server, and taking down one physical server may take down many business processes and *that* will probably get someone’s attention.

Instead, I contemplated rebooting an employee’s desktop PC. Here, two scenarios are considered: 1. An employee’s desktop is rebooted and files are copied over the LAN, or 2. the data is copied directly from the employee’s desktop PC’s hard drive.

Alternatively, a laptop is brought in by the individual performing the collection; it may have Linux installed on it already, or a Linux USB drive is plugged into it and the laptop is rebooted into Linux. This latter idea has some appeal because you don’t have to commandeer someone’s desktop computer. On the other hand, as some have suggested, if the content of the “NGP VAN” 7zip has little to do with “NGP/VAN” (apart from a few spreadsheets and reports here and there) looks more like the dump of some Dem worker’s work product (Documents directory), then the collection can be made by going into that person’s office/cube, rebooting their desktop PC, and copying off the data. No servers required, no logs made, no authentication needed. After hours, the day after a 3-day July 4 weekend might be a good time to do that.

You state: “Crowdstrike presumably still has the harddrive images. And they claim to have sent copies to the FBI”. I missed that Crowdstrike claim. Can you provide a cite? It doesn’t square with the Comey’s testimony that the FBI was denied access to the DNC servers.

It is possible that DNC might have had servers or VM’s running Linux. Linux-based systems might even run the NGP-VAN software for all I know. They might serve up users’ mail and their shared home directories. If a Linux based server was accessed, it would certainly be easy to find the Unix ‘cp’ command on that system — and a USB device can be plugged directly into that system without a reboot (there would probably be a log entry though, if anyone cared to check it).

“In May 2016 CrowdStrike was brought to investigate the DNC network for signs of compromise, and under their direction we fully cooperated with every U.S. government request,” a spokesman wrote. The cooperation included the “providing of the forensic images of the DNC systems to the FBI, along with our investigation report and findings. Those agencies reviewed and subsequently independently validated our analysis.”

I assume they were referring to full block-device level harddrive images.

Thanks, that is an interesting disclosure. For those who didn’t click through the Wash Times URL, that article was posted quite recently on 7/5/2017. Yes “image copies” is equivalent to “bit-fot-bit” (or “block-by-block”) image copies. In the article, it was difficult to determine when exactly the images were mode. Crowdstrike was on scene at the DNC as early as April, 2016 per some reports. Anyway, it appears that the images were made well ahead of the 7/5/2016 date that the timestamps indicate that Guccifer 2 took the (so-called) NGP-VAN data.