Post navigation

63 thoughts on “Guccifer 2.0 NGP/VAN Metadata Analysis”

Comment navigation

Comments are closed. They have been open for over a month; hopefully this has given ample opportunity for readers to comment. Responding to comments is worthwhile, but time-consuming; The Forensicator needs to turn his attention to other projects. Thank you everyone who has taken the time to comment.
— The Forensicator

New blog post: Summarizes the Internet speed issue, adds new transfer speed calculations that raise the bar for transfer speed over the Internet, discusses alternative theories, and corrects the record.

Thanks for your diligent and detailed work on this. I noted from the Wikileaks Vault 7 dump that obscuring timestamps is standard CIA tradecraft. For example, one of the Vault 7 documents entitled “Development Tradecraft DOs and DON’Ts,” included this instruction to developers: “(S//NF) DO NOT leave dates/times such as compile timestamps, linker timestamps, build times, access times, etc. that correlate to general US core working hours (i.e. 8am-6pm Eastern time).” It added that this is important because it “(S//NF) Avoids direct correlation to origination in the United States.”

This suggests that nation-state actors are quite conscious of the need to sterilize timestamps. Would we not assume that Russia employs similar tradecraft? If so, the Eastern Time Zone settings may not all that meaningful,.

(reposted to include 2nd paragraph inadvertently cut off from initial post)

This suggests that nation-state actors are quite conscious of the need to sterilize timestamps. Would we not assume that Russia employs similar tradecraft? If so, the Eastern Time Zone settings may not all that meaningful,.

This NYT article, dated Dec. 13, 2016, states: “Another clue: The Russian hacking groups tended to be active during working hours in the Moscow time zone.” in reference to Cozy Bear and Fancy Bear and their alleged DNC hacking activities.

Apparently, those Russian hackers didn’t get the Vault 7 memo. There are several other “bread crumbs” that have led back to Russia; the presence of obvious clues has raised eyebrows among a few security researchers.

The method used to determine that East Coast time zone settings were in effect is non-obvious and unlikely to have been anticipated by individual(s) linked to Guccifer 2. Thus, it is highly unlikely that Guccifer 2 intended to communicate that fact. Some have suggested that Guccifer 2 set the time zone on his computer to Eastern Time, when in fact he lived somewhere else. An argument that challenges that idea is that Guccifer 2 spent a lot of time and effort to convince everyone that he is a Romanian hacker. Many have challenged that claim; some have suggested that he might be a Russian hacker. No one, however, has suggested that Guccifer 2 might operate on the East Coast.

1. We do not know *where* the DNC server (source) was located, do we? A datacenter hosted somewhere in the US? A server room within the DNC offices?

2. We do not know what kind of OS that DNC server ran on, nor what was the email server software running on that box, do we?

3. We do not know what kind of connectivity was used to connect that email server out to the world, do we? As you know, it is *common* for datacenter-connected servers to be able to access the Internet at gigabit speed. Large ISPs criss-cross the country at 100 Gbps without breaking a sweat.

4. Therefore, that figure of 23 MB/s copy rate (which represents a mere approx. 184 Mbps throughput) is not that impressive. Without the knowledge of where the source was located, and how it was connected to the Internet, I fear you cannot draw any conclusion regarding the initial copy target.

5. You are correct to point to the fact that TCP streams are much more affected by latency (compared to UDP). When one does a transfer speed test of TCP packets, the total capacity of a circuit will be reached through multiple, parallel IP “streams” which, taken together, will represent the total speed capacity for a given circuit. If you test with only one (1) TCP “stream,” the speed you will obtain will not be representative of the total, real capacity of a circuit, which will rather be obtained by adding all the TCP streams operating together in parallel. For long-distance, this type of mathematical calculation depends upon several factors, including the total latency of a circuit, the “TCP Windows Size” that was used during the tests, the size of the packets used during the tests, etc. You can use the following tool to determine those various parameters, and therefore obtain the maximal speed which can be obtained for *each individual stream*: https://www.switch.ch/network/tools/tcp_throughput/

But so my point is complex:

a) You are wrongly assuming that one cannot reach that sustained speed over the Internet
b) You are wrongly assuming that the files were copied directly from the source (East Coast) off to Romania:
i. even as you don’t know *where* the source was located
ii. even as you don’t know where the *target* was located —- what if the hacker (in Romania) remote-controlled a PC on the East Coast? How would you know that the target was *his* PC?
c) Maybe this was a high-capacity circuit, on a short-ish distance, which was busy doing something else at the same time the copy was taking place

Regarding location, there are statements from Crowdstrike that they found indications of malware on DNC servers. For example, here is a July 2016 Wired article that states that DNC servers were attacked. In Congressional testimony, the head of the FBI was asked were they given access to the DNC servers? Admittedly, I don’t recall a source that actually nails down where the DNC servers are located, but it seems reasonable to assume that they had some servers on site and that these were the targets of various hacking attempts.

You mention “email servers” several times. Just want to point out that Guccifer 2.0 leaked documents (and perhaps a few incidental emails in those documents). It wouldn’t surprise me to find out that email services were outsourced.

Regarding your hypothetical scenario that files may have been transferred from the DNC server (via a hack) to another computer system that just happens to be close to the DNC and has an Internet link likely in excess of 300 Mbits/sec., all in order to demonstrate the fact pattern of 23 MB/s and East Coast time settings, then whether conclusion 7 stands or falls depends upon your assessment of the likelihood of that scenario. I consider that possibility to be highly unlikely, YMMV.

In my view, the “standard of proof” should only be sufficient enough to encourage a formal, thorough, investigation of the various claims of Russian hacking and interference. My goals align with the VIPS who have formally requested such an investigation.

Hacker creates beach head exploit on the network and places a virtual machine. Virtual machine collects data over the LAN. Virtual machine copies data to WAN. Maybe there was some docker command that used a FAT file system. Get some sleep…

Most likely the hacker logged into the machine inside the target here, ran the copy to the local machine and ran the compression – all over Remote Desktop/SSH/X-Windows/VNC. All the work would be done on the remote machine (the one located on the LAN) and the result compressed tarball or source file was uploaded in one shot.

Guccifer 2 remains an enigma for many security researchers. Adam Carter at g-2.space has done a solid job of covering the controversy surrounding Guccifer 2. As to whether it is important to discover more about Guccifer 2, there are probably as many motives as there are people who care about the issue. For me, my motives run along the lines of the VIPS who are asking for formal investigations into the “Russia hacking efforts influenced the elections” narrative. Ideally, such an investigation would result in fact based public disclosures that would provide convincing evidence to support the conclusions that result from such an investigation.

Although many security researchers have significant doubts about Guccifer 2’s legitimacy, his presence is still influencing US public policy. As recently as two weeks ago, his name came up at the prestigious Aspen Security Forum. In this Youtube video clip, one of the panelists mentions Guccifer 2 and says that “At a certain point, you would have to have blinders and ear muffs on not to know that Guccifer 2 is a Russian intelligence agent.”

If it could’ve been done from the Russian Embassy by another person at those speeds, then why do we care if it’s Guccifer?

There are many possible conclusions that can be drawn from the observations made in the analysis, some more probable and plausible than others. On your specific suggestion that someone at the Russian Embassy aided Guccifer 2, that would be (IMO) a pretty big deal if true. In any event, such a scenario is certainly counter to Guccifer 2’s narrative.

Although a non-technical argument, I don’t know why the Russians would introduce additional risk by executing part of their operation on US soil, especially out of the Embassy. They know that they will be surveiled out the wazoo.

Shouldn’t the conclusion be: “It’s highly unlikely that Guccifer 2.0 is responsible for this portion of the hack. It would have had to have EITHER been someone on site OR someone at a nearby remote location.”?

The point of the analysis is to make its observations public so that the community/public at large can arrive at their own preferred conclusions. Hopefully, the study might encourage additional investigation and research.

What if the host was a virtual host on AWS east/west, and the use of FAT was because of being copied to a mobile device? A mobile device accessing a virtual host to transfer a file wouldn’t need to be physically mounted to the host, and the east/west zone selection could explain the time zone.

If I understand this idea correctly, the AWS server is used as a collection point for the first copy operation. If we assume that the origin for the files is the DNC, then we have a situation like this: DNC-to-AWS and the second copy goes like this: AWS-to-cell-phone?

Presumably, the AWS server is a collection point on the East Coast. I think it is an interesting idea, but it has the same issues as any other local host: (1) how do you mask its IP address without going through a VPN, and if you’re going to use VPN why did you choose a host on the East Coast, rather than say France, as had been done before? (2) if the AWS host is used only as a collection point, why is it needed at all?

AWS in particular, presents a major risk for our hypothetical Russian hacker – Amazon closed a $600 million deal with the CIA back in 2014. If you’re a Russian hacker, you will probably look for a lower risk, simpler solution.

For the second copy operation, when the .rar files were built, the situation is more complex than simply copying some files to a FAT-formatted file system. WinRAR is needed to build the .rar files and WinRAR is a Windows program.

Also, the study states that East Coast time was in force for the second copy operation as well. If you bring in a cell phone as the device that the .rar files are copied to, then the odds have it that the cell phone is on the East Coast.

Although there may be a way of putting those pieces together so that that will theoretically yield the observed metadata, to this author the scenario you describe seems unlikely and overly complex.

Thanks for your response! But if the assumption that a flash drive was used is based solely on the use of FAT, you can’t rule out that the file was stored on possibly a mobile device instead, which would eliminate the need for physical access to the data. And just because the CIA is USING AWS doesn’t mean they’re monitoring anybody else’s environment. You don’t need much in the way of ID or other to set up a virtual host, and they can be stood up or torn down in literally seconds. You can create a VPC with a publicly routable CIDR block that falls outside of the private IPv4 address ranges, and you can configure subnetted hosts without private IPs.

Conclusion 7. A transfer rate of 23 MB/s is estimated for this initial file collection operation. This transfer rate can be achieved when files are copied over a LAN, but this rate is too fast to support the hypothesis that the DNC data was initially copied over the Internet (esp. to Romania).

Below, performance data is tabulated that demonstrate that transfer rates of 23 MB/s (Mega Bytes per second) are not just highly unlikely, but effectively impossible to accomplish when communicating over the Internet at any significant distance.

Further, local copy speeds are measured, demonstrating that 23 MB/s is a typical transfer rate seen when writing to a USB-2 flash device (thumb drive).

“The initial copying activity was likely done from a computer system that had direct access to the data. By ‘direct access’ we mean that the individual who was collecting the data either had physical access to the computer where the data was stored, or the data was copied over a local high speed network (LAN).”

How did you determine that the July 5 copying was the initial copying?

How did you determine that the July 5 copying was the initial copying?

The study discusses two copy operations: the first was done (per the metadata) on July 5, 2016 and the second on Nov. 1, 2016. In this context, initial copy is another way of referring to the first copy operation of the two.

Some reviewers have noted that the July 5, 2016 dates present in the metadata overwrote any previously recorded dates/times, which of course is true. They further note that prior intermediate copy operations may have been performed, which is also true. Some have opined that if Guccifer 2 pulled data from his previously claimed hack and simply copied that data to say his local hard drive on July 5, 2016 that the pattern present in the metadata might result; also true.

We should also keep in mind that the study concludes that Eastern time zone settings were in force on both the first (initial) and second copy operations. Some reviewers have noted that Guccifer 2 could have manually set his timezone to Eastern time – also true.

Such an action (manually setting the time zone to Eastern time, when not physically being located there) seems out of character for Guccifer 2 who went to a lot of trouble to convince the public he is a foreign (Romanian) hacker.

Further, for anyone who wants to claim that Guccifer 2 might have set his time zone to Eastern time in order to intentionally give the impression of being on the East Coast, that can only make sense if we are to believe that he thought ahead about the relationship between the local times recorded in the .rar files and the UTC times recorded in the 7zip file. That relationship is quite obscure and went unnoticed for almost a year. The idea that Guccifer 2 decided to depend upon someone stumbling onto that relationship as a method of disclosing his East Coast time setting is far-fetched, to say the least.

You may not have intended it, but your report is being widely misread as addressing the original migration of the files off the DNC’s network, when, as you seem acknowledge, it actually addresses the packaging of the files for public release, which might have occurred weeks later on the attacker’s own machine. It’s sad to see your painstaking analysis so wildly misunderstood because of ambiguous language in the “key findings” section at the top.

“There is a series of files and directories that have no time gaps: it includes some top-level files and the FEC directory. The total size is 869 MB, which is 40% of the total. Using only the earliest last mod time and the latest in that series of files, the total elapsed time is 31 seconds. The transfer rate for those files works out to 28 MB/s. On that basis alone, we can be fairly confident that 23 Mb/s is in the right ballpark. (…) there are only 9 time gaps, which all occur at the top-level.”

Might want to clarify that in the article. I thought you were suggesting these time gaps, in various lengths, between every top-level file or directory.

…But then why bother with the rest? That alone, proves local not remote.

“The transfer speed calculation is not very sensitive to the estimate of average transfer speed. If we guess 2 MB/s, the result is 20.5 MB/s. If we guess 200 Mb/s, the result is 27.3 MB/s.”

Now that I understand why the estimate & result aren’t identical, I’m still surprised those results aren’t more tightly bound. There must be something else I’m missing.

How about posting the spreadsheet? It would probably save us both a lot of time going back&forth like this.

How about posting the spreadsheet? It would probably save us both a lot of time going back&forth like this.

For now, I have decided not to make the spreadsheet available, due to privacy and anonymity concerns. In spite of Microsoft’s assurances that the method they provide for removing metadata is adequate, I have my doubts.

But then why bother with the rest? That alone, proves local not remote.

I didn’t think about selecting just one part of the overall transfer as a way of giving a representative transfer speed. Besides, if I had done that, some reviewers might have accused me of cherry-picking (/sarc).

I’m still surprised those results aren’t more tightly bound[ed].

There are 9 time gaps. Only the start times of the files at the beginning of each time gap need to be estimated in order to arrive at a more accurate estimate of the overall transfer time. The total size of the 9 files at the beginning of each time gap is about 17 MB. At 1.7 MB/s transfer rate, that would amount to 10 seconds being added to the transfer time. That’s about a 12% change to the calculated transfer time, which is noticeable, but not large.

1. Take the difference between one file’s timestamp, and the next. […] 8. Conclude something about transfer speeds from the result. The result is whatever “average transfer speed” you used in step 2.

Not quite. As explained in the analysis: “We can estimate the transfer speed of the copy by dividing the total number of bytes transferred by the transfer time. The transfer time is approximated by subtracting the time gap total from the total elapsed time of the copy session.” and “We further calculate the “time gap” which is the difference between the last mod. time of a current entry and its previous entry; from this we subtract an approximation of the transfer time (using our knowledge of average transfer speed) to go from the last mod time to a likely time that the transfer started. We use a cut off of at least 3 seconds to filter out anomalies due to normal network/OS variations. Here are the entries with significant time gaps.”

The key here is that the overall transfer time is calculated across the entire elapsed time of the copy, subtracting out the sum of the time gaps. As shown in one of the figures, there are only 9 time gaps, which all occur at the top-level. The difference between the earliest last mod time and the latest is 14:15 (14 mins, 15 secs) and the sum of the time gaps is 12:48 (12 mins, 48 secs). The difference is 87 seconds – that’s the transfer time. If we divide the total bytes transferred (1976 MB) by the transfer time of 87 seconds – the resulting transfer speed is 22.6 MB/s.

It would be a mistake to look only at file-by-file differences in last mod times because that wouldn’t account for the time gaps, and the time gaps are a significant percentage of the total elapsed time (90%). Also, there can be some variability in recorded last mod times — other OS activities will add variability. Thus, it make sense to work on the overall time in aggregate; this averages out the effect of “noise”.

When calculating “time gaps” a correction factor is subtracted from the last mod time to approximate the “first write time”. This is important for large files. Consider the following.

File Last Mod Size
A 6:45:00.000 10,000
B 6:45:10.000 226,000,000

If we just took the difference between B’s last mod time and A’s last mod time, we’d get 10 seconds. If we view that 10 seconds as a “gap” it will lead to an incorrect estimate for the time gap between when A was last written and B was first written (the “gap”). But if we subtract (226,000,000/22,600,000) or 10 seconds from the last mod time of B we arrive at a time gap of zero (0) rather than ten (10) seconds. Since we don’t know the overall transfer rate until we calculate the time gaps as above, the process is iterative. I did the iteration by hand, but Excel’s “Goal” seek could have been used to good advantage.

Your “method” is basically a variant of those grade-school “guess a number” fake magic tricks.

The method used is an iterative approximation which is a standard technique. Please see my previous reply.

There is a series of files and directories that have no time gaps: it includes some top-level files and the FEC directory. The total size is 869 MB, which is 40% of the total. Using only the earliest last mod time and the latest in that series of files, the total elapsed time is 31 seconds. The transfer rate for those files works out to 28 MB/s. On that basis alone, we can be fairly confident that 23 Mb/s is in the right ballpark.

The transfer speed calculation is not very sensitive to the estimate of average transfer speed. If we guess 2 MB/s, the result is 20.5 MB/s. If we guess 200 Mb/s, the result is 27.3 MB/s.

If this above refers to the transfer speed estimate, that is just one part of the analysis. It is the part of the analysis that receives the most heat, but is not necessarily the most compelling factor. Consider, for example, the second copy operation done on Nov. 1, 2016, likely on the East Coast with indications that the results were written to a thumb drive. That suggests the physical presence of someone to plug in and retrieve the thumb drive. Yes, we can bring another actor into the picture to explain that observation, but we have then moved well away from the “remote Russian hacker” narrative.

[…] the attacker could be remoted into a states side host via RDC or other remote protocol

ThreatConnect reported in their analysis that Guccifer 2 used a commercial VPN service vectoring through Russia (IIRC) for previous communications. Did he decide to use a different approach when grabbing the “NGP VAN” files? If you contemplate the use of a host close to the DNC, you’ll also have to address: (1) how did Guccifer 2 obtain access to this host? (2) how would Guccifer 2 avoid the risk of disclosing that IP address in DNC logs? (3) even though this hypothetical host is close to the DNC, can it sustain a 23 MB/s transfer rate? and lastly (4) why would Guccifer 2 introduce this additional host?

Re: the transfer speed, although the average transfer rate was estimated at 23 MB/s, if we look at a subset of the metadata (the FEC directory and some other top-level files) which has no internal gaps and represents 40% of the total bytes transferred (869 MB), the calculated transfer rate for that chunk of files is 28 MB/s; that speed will be difficult to obtain over the Internet even with very high speed connections at both ends.

Given those complications, some reviewers have posited a “local pivot”, where the files are first copied in bulk to a local directory on a DNC server and then uploaded back to wherever Guccifer 2 is located. As I mentioned in another comment, unexplained in that scenario is why would a remote hacker need to make that local copy, or want to? It leaves a large footprint (perhaps 20 GB per the analysis) and is unnecessary.

Essentially this proves nothing.

The purpose of the study is to analyze the file metadata present in the “NGP VAN” data disclosed by Guccifer 2, which he attributes to the DNC. Guccifer 2 also claims to be Romanian; a claim that has been disputed. He also claims to have obtained the data by hacking DNC servers (remotely).

The analysis does not prove anything, but tries to reach plausible conclusions based on the data. Those conclusions generally dispute Guccifer 2’s claims. It is up to those who review the analysis to decide on the degree to which those conclusions are compelling.