Time and Tide Wait for No Protocol: The SSH Keystroke Timing Attack of Song, Wagner, and Tian

At the 10th Usenix Security Symposium (Washington D.C., August 2001), U.C. Berkeley researchers Dawn Song, David Wagner, and Xuqing Tian presented a paper titled, Timing Analysis of Keystrokes and Timing Attacks on SSH. The paper describes their research into applying traffic-analysis techniques to interactive SSH connections in order to infer information about the encrypted connection contents. The paper concludes that the keystroke timing data observable from today's SSH implementations reveals a dangerously significant amount of information about user terminal sessions--enough to locate typed passwords in the session data stream and reduce the computational work involved in guessing those passwords by a factor of 50.

Not surprisingly, this paper initiated a great deal of discussion among SSH users, developers, and the security community in general, especially in public forums such as Slashdot. In this article, I will summarize the issues involved, discuss the paper's methods and conclusions, and dispel some of the often-repeated misconceptions in the public's reaction to this research.

Traffic Analysis

The paper revolves around the notion of traffic analysis, and while it uses SSH as a concrete example, the techniques involved are not specific to SSH, but rather apply to most interactive remote-terminal protocols as they are implemented today. The principle of traffic analysis is that there is a lot of useful information to be gleaned from the amount, timing, and direction of network traffic, even if you can't actually read the traffic content itself. Suppose I'm monitoring the network port leading to a system administrator's office. Perhaps I can't read or alter her SSH connections, but I can see:

when those connections are made and torn down, telling me when she arrives at work and leaves for the evening.

the addresses of the connected machines, telling me which hosts she's responsible for (and which ones she's not looking at).

the amount, rate, and timing of traffic going over the SSH sessions (separately in each direction), giving me some information about what she's doing. (For example, I can distinguish between terminal and file-transfer sessions.)

when all the terminal sessions go quiet, suggesting she may be out to lunch and now's a good time to hack her workstation (or break into her office).

These sorts of information leaks are very difficult to prevent. If you control all the physical components in your network and you can ensure that they all use link-level encryption and appropriately random idle-traffic generation, then that would be a start towards preventing traffic analysis--but except for very specific high-security installations, most of us have neither the opportunity nor the benefit of such designs. Considering the expense (or practical impossibility) of such countermeasures, traffic analysis is usually considered an unavoidable risk.

When accepting this risk, however, we generally assume that the danger is wholly indirect: an observer may be able to guess some useful ancillary information from traffic patterns, but the actual data is safe as long as we're using a good protocol, such as SSH. The disturbing conclusion of the recent Usenix paper is that this is not so; using sophisticated mathematical techniques it is possible to gather significant amounts of information about the contents of an SSH terminal session, by taking advantage of its interactive nature.

Interactive Traffic and Keystroke Timing

The delivery requirements for interactive traffic are different than those for bulk data transfer. When transmitting a file, TCP data can be batched up and sent in large IP datagrams for efficiency. But when you type a character in a terminal session, you expect to see it echoed immediately--not after you type the next thousand or so! Local echo and line editing can address that, but that won't work with curses-style software like Pine, or with the fancy line editing, command matching, recall, and other features of your favorite shell. So programs like Telnet and SSH send terminal data as soon as it becomes available, which means that the TCP connections for these programs transmit many short IP packets containing individual keystrokes and their echoes.

This real-time reflection of the user's typing behavior in the network stream is the basis of the paper's attack. Song, et al., observe that typical transmission latency in LANs, or indeed over the Internet, are negligible when compared with the inter-keystroke delays revealed by timing the arrival of single-keystroke TCP packets. Therefore, measuring packet-arrival times gives an accurate picture of the user's actual gestures at the keyboard.

What do you think about the SSH Keystroke Timing Attack of Song, Wagner, and Tian? Please share your security experiences with SSH.

The bulk of the paper then goes on to treat a topic that is entirely separate from SSH or network protocols in general, but which has important implications for them. The question is: does human keystroke timing leak significant information about the keys pressed? If so, can that information be used to guess at keystroke sequences? The answer to both questions is unfortunately "yes."

To summarize very broadly, the authors use a statistical technique called a Hidden Markov Model (HMM) to capture a user's keyboard habits, given data gathered from observing typing patterns and used to "train" the model. Then they use a prediction algorithm (a variation of the Viterbi algorithm) to guess at key sequences, given only the inter-keystroke timings together with the previously constructed model. The researchers tested their technique experimentally, using training data from real human subjects. The results suggest that timing data can leak as much as 0.71 bits per character on a typed password, corresponding to a 50-fold reduction in the work required for an offline dictionary attack. To estimate the practical effect of this, they write:

"For example, for a password containing randomly selected, lowercase letter keys and number keys, without timing information, the attacker would need to try 368/2 candidate passwords on average before he finds the right one. Benchmarks indicate that a 840MHz Pentium III can check about 250,000 candidate passwords per second in an offline dictionary attack. Thus, exhaustive search would take about 65 PC-days to crack a password composed of randomly selected, lowercase letter keys and number keys. If the attacker uses the timing information, the computation can be done in 1.3 days...."

On the face of it, this is pretty serious leak. But remember that so far, this is all still a fairly abstract investigation in an ideal, laboratory setting. There is still the question of whether this translates into a real threat when applied to actual network traffic. The paper goes on to address this as well.

A Real Attack

Remember that the investigators' tests involved predictions based on typing isolated passwords. In real life, an attacker would first have to isolate the packets corresponding to a password in the stream of TCP segments in an SSH connection. But there are multiple ways this might happen. First, observation may show repeated habits that leak this information. For example, Eve, the attacker, may observe from network traffic that her victim, Paul, often connects first to host A, then immediately connects from A to a second host B, again using SSH. If Eve suspects that Paul is using password authentication for the second connection, then by experimentation on her own she can know at least approximately where in the second SSH stream to look for the typed password. Furthermore, there may be clues again in the traffic timing that reveal the positions of sensitive data. The observed pattern of using the "su" command, for example, is peculiar: three echoed characters, followed by a single larger packet containing the "Password:" prompt, followed by several characters from the client that are not echoed: the password. Identifying such timing signatures in a recorded SSH stream could identify likely keystroke sequences for attack.

Of course, in order to conduct a dictionary attack, Eve needs a password verifier: some way to test whether a guessed candidate password is correct. This might be, for example, a hash of the password lifted from the account database of a compromised system on which Paul has an account. The figure of checking 250,000 passwords per second is predicated on being able to verify guesses efficiently. Without a verifier, Eve would need to attempt a login to test guess--and hopefully thousands of failed logins would raise a red flag long before Eve could discover Paul's password. Remember, though, that attempting a login via SSH may not be the only way to verify the password. The same password will likely be used for authentication for a number of protocols: Telnet, SMB, RADIUS, and so on.

It's important that all services that perform authentication detect unusual authentication failure rates and take some action: notify the system administrator, and perhaps automatically lock out the account until someone can determine what's going on. This is also a good argument for using a centralized authentication system, such as PAM, so that such security monitoring can be configured once for all services, rather than requiring that separate support be written into and enabled for every service.

Another obvious difference between the laboratory and a real attack is that the researchers had the advantage of cooperative subjects, who thoughtfully sat down and provided training data for the HMM's. Unless she's also a hypnotist, Eve will presumably not be able to first get Paul to sit down and type random passwords for her--and if she has infiltrated his workstation sufficiently to capture his keystrokes for this purpose, she won't need to go to the trouble of this sophisticated attack to capture his password! So, this begs the question of whether a personalized model is necessary for a successful attack.

The paper examines this question as well, both abstractly and experimentally. The authors write that comparing characteristics of timing data from different users "suggests that typing statistics have a large component that is common across a broad user population and which thus can be exploited by attackers, even in the absence of any training data from the victim."

They also experimented with this, using models from one subject to analyze the password keystrokes of another. They conclude that while using a model from the same person is (unsurprisingly) more effective, nonetheless "training data from one user can successfully be applied to infer passwords typed by another user. Hence the attack can be effective even when the attacker does not have typing statistics from the victim." Of course, the number of subjects in these experiments was quite small (four), and it seems that further research is needed to discover how broadly these relationships hold in the population, and thus confirm these findings and their significance.

Quite aside from the ability to infer keystrokes, there also is the possibility of using timing data simply to identify the user. Keystroke timing patterns have been used in commercial products as a biometric identification device; the same principle might be applied here. This would require the attacker to have specific typing profiles on hand for his victims--but if a company were using timing biometrics that database would be stored somewhere and might be stolen for this very purpose. Of course, with personal workstations so common today, the source IP address is often a pretty conclusive indication, but this is still an issue to be noted.

Common Misconceptions

Here are a few often-seen misconceptions regarding the paper's methods and conclusions.

This attack applies to SSH "password authentication:" It's unsafe and we should stop using it immediately.

Confusingly, this is both true and false, depending on how you interpret the sentence. The technique in question does not apply to SSH password authentication, when the attacker is observing an SSH protocol session using that method for its own authentication phase. This is because the password is transmitted all at once in a single SSH protocol message, not a character at a time as the user types it (recall that this happens before the SSH terminal session has even started).

However, if Paul happens to use the first SSH session to start another one--that is, ssh to one host and then run ssh there again, to connect elsewhere--and if that second SSH connection uses password authentication, then he is vulnerable. The password will be safe while traveling inside the second SSH connection, but it will be exposed for a timing attack as he types in it over the first connection. Still, even interpreted this way, the statement is not quite fair, as this is not a vulnerability of SSH specifically; any passwords typed in the first session, for any reason, are similarly vulnerable.

Another point confuses this issue, which is that the paper mentions a separate potential vulnerability: one which is specific to SSH password authentication but unrelated to the timing attack. If a client sends the SSH protocol message containing the password in a single TCP segment, then this can reveal information about the length of the password, to within the block length of the cipher being used (commonly 8 bytes). This can give an observer close upper and lower bounds on the password's length, again reducing the amount of work needed for a brute-force attack.

However, it turns out that this is not as serious a problem as the timing attack, as there is an easy way to address this issue: simply pad the password message out to a fixed length. The SSH transport protocol allows for arbitrary padding, so this is a simple thing to do. OpenSSH already implements this defensive move, but in a slightly different way: it appends an SSH "no-op" message immediately after the password message, whose length is calculated to extend the sum of the two message lengths to a fixed value. The two messages are then transmitted in a single TCP segment. In the SSH-2 protocol the message lengths are encrypted along with the rest of the message, so Eve cannot break the segment up into its component messages, and she thus gets no information about the password length (assuming there's no other structural subtlety that reveals the message boundaries; there shouldn't be).

Just use SSH public-key authentication instead; it avoids the problem.

Again, this can be right or wrong, depending on what's meant. Certainly, it avoids the password-length vulnerability just mentioned, if you're using a client that does not already address this by the padding technique. If you're doing consecutive SSH sessions, then using public-key authentication for the second connection can avoid the timing attack--if you use an SSH key agent on the first host, with agent forwarding to the second. Otherwise, you will end up typing in your passphrase on the second host to decrypt your SSH private key, and your passphrase will be exposed to timing analysis in the first SSH connection. Of course, your passphrase by itself is useless without also stealing your private key file, so the compromise is not immediate if your passphrase is cracked, but you still don't want that to happen.

Learn Dvorak.

It's not clear how much this would really help. Presumably, the same attack will work with model data derived from Dvorak typists instead. For all we know, it may be simple to guess whether someone is using QWERTY or Dvorak based on timing patterns as well. Even if not, running both models at once would yield a work reduction factor of 25 instead of 50 (assuming a Dvorak model is as equally effective as the QWERTY ones studied), which is still a serious problem. Besides, this is not terribly practical advice, since millions of people are simply not going to switch keyboard systems overnight (or at all).

While SRP is a cool protocol with much to recommend it, this idea misses the point just as does the previous exhortation to use public-key authentication: The problem is when you type in a password over an SSH terminal session, regardless of what spiffy thing is being done with the password once it's entered.

One-time passwords would be an improvement because they're typically much longer and useless by the time the attacker cracks them anyway (if the OTP system is well designed and administered).

Defenses

So, if timing analysis is such a problem, what do we do?

In the long term, if this sort of attack proves to be a serious problem, we will have to develop better protocols and deploy different networking technologies to address these attacks. Traffic analysis is a hard problem with no immediate, easy answers. The paper suggests some approaches, all of which need further research and testing:

Introduce random jitter into keystroke packets as they are sent. This obvious approach has problems, though, because if the jitter is in fact random, an observer may be able to filter it out over repeated timings of the same password.

Send a constant stream of packets even when idle, thus masking the real traffic among the fake. The dummy packet timings would have to be carefully crafted to be indistinguishable from the real ones, and the rate chosen is a tradeoff between security and both user-visible latency and network overhead.

In the short term, here are a few countermeasures that may help to varying degrees:

Defeat echoing signatures. Make it harder to guess the positions of passwords in a session by having the server simulate echoes using dummy messages when the client disables terminal echo. Of course, there are other ways to guess when passwords are being typed, but anything that makes it harder is good.

Don't use passwords on remote machines. To whatever extent possible, implement single sign-on: you should only have to enter passwords on your local machine. The ssh-agent available in many SSH implementations, together with agent forwarding and public-key authentication, provides this capability for SSH specifically. Combining SSH with a broader technology such as Kerberos, which also has a credential-forwarding feature, can give you the best of both worlds. The author uses a system whereby he can authenticate once, using Kerberos (kinit) on his local machine, then SSH elsewhere and he'll have access to not only other SSH servers, but also to Kerberized versions of su, secure Telnet authentication to Cisco routers, and so on. SSH1 has Kerberos-5 support, including ticket forwarding, and there is a patch for GSSAPI/Kerberos support in OpenSSH for protocol 2. Of course, it's almost impossible to get pervasive single sign-on in a heterogeneous system, but anything you do is an improvement.

Protect hashed passwords. As mentioned above, the attacker will want a verifier for performing an offline dictionary attack. Make her work for it! Make sure you're using shadow password files that are readable only by root, and do not store your password hashes in the world-readable /etc/passwd file. If you use NIS, use it with secure RPC to avoid revealing hashes over the network. If you use Kerberos or a similar third-party system, you can avoid having distributed password verifiers, instead keeping them on the single-purpose KDC's (key distribution centers), which are easier for you to secure than a bunch of general-purpose login hosts.

Conclusion

The issues raised in the paper by Song, Wagner, and Tian are not a reason for panic (as with, say, the announcement of a practical root exploit in deployed software). The attack presented is sophisticated and it has not yet demonstrated how well this research will translate into general real-world attacks. However, the issues are serious and real, and hopefully the paper will heighten awareness of traffic analysis as a threat and spur more research into practical countermeasures. In the meantime, users and system administrators should address what they can by considering the suggestions made in this article, which are, for the most part, already accepted security "best practices."

There is an optional TCP feature called the Nagle algorithm, which batches up sequential small TCP segments for better transmission efficiency. This would interfere with timing measurements; however, interactive applications often specifically turn off this feature by setting the TCP "NODELAY" option, in order to get better interactive response. OpenSSH, for example, does this.