Description

In channel_tls_process_netinfo_cell() we have a section that says

if (!(chan->conn->handshake_state->authenticated)) {
tor_assert(tor_digest_is_zero(
(const char*)(chan->conn->handshake_state->
authenticated_rsa_peer_id)));
tor_assert(tor_mem_is_zero(
(const char*)(chan->conn->handshake_state->
authenticated_ed25519_peer_id.pubkey), 32));
/* If the client never authenticated, it's a tor client or bridge
* relay, and we must not use it for EXTEND requests (nor could we, as
* there are no authenticated peer IDs) */
channel_mark_client(TLS_CHAN_TO_BASE(chan));

That's great and awesome -- it's a clear solid definition of client: a client is somebody who did the client handshake with us, not the relay handshake.

This definition of client is much murkier: a client is anybody who, at this moment of receiving a create cell, had earlier presented an identity digest that isn't in our current consensus and that we don't have a cached descriptor for.

I think we should standardize on one definition, and I think we should pick the first one: it's clean and not full of false positives.

Change History (25)

I discovered this bug while working on the current circuit / conn overload patches, considering blacklisting client IP addresses that misbehave in certain ways. I found that my fast relay has a *lot* of conns that are marked as channel_is_client() that are very clearly connections to/from relays.

I assume the conns started off not marked as clients, and then the clause in command.c declared them as clients.

Right now, in channel_tls_process_netinfo_cell(), for the OR_CONN_STATE_OR_HANDSHAKING_V3 case, we call channel_mark_client() properly.

I propose that we call everything a client that handshakes with tor_tls_used_v1_handshake() or OR_CONN_STATE_OR_HANDSHAKING_V2, since relays should never use those legacy handshakes with other relays at this point.

It's on master though, and we might want to backport this. I'm not sure how far back to backport it -- we could go back to maint-0.3.1 pretty easily (there's a conflict but it's just because of a comment), but going back to 0.2.9 would be quite tricky because #21406 went into 0.3.1.

The question of whether to backport more depends on how much we want these fixes in 0.2.9 -- this one is a cornerstone of some of the DoS mitigations that might come next.

(This code sets chan->is_client = 0; directly in 0.3.0, and it does the rep_hist thing in another place, so we'll need another branch if we want to backport there. The good news is that the create cell bug was introduced in 0.3.1 with the netflow padding code, so we only need to backport the v1/2 fix to 0.3.0 and earlier.)

Why (did we introduce the create-cell check in the first place? ISTR that we thought we had a reason to do so. It looks like Mike introduced it in b0e92634d85a3bf7612a6ce0339b96e4aad1e0bb, and moved it in 02a5835c27780e45f705fc1c044b9c471b929dbe. Adding mike to cc.

We could skip the 0.3.0 backport, I think, since 0.3.0 is EOL at the end of this month. But we should do an 0.2.9 backport if we think this is important: 0.2.9 is supported till 2020.

Why (did we introduce the create-cell check in the first place? ISTR that we thought we had a reason to do so. It looks like Mike introduced it in b0e92634d85a3bf7612a6ce0339b96e4aad1e0bb, and moved it in 02a5835c27780e45f705fc1c044b9c471b929dbe. Adding mike to cc.

Mike's comments say that it led to chutney failures.
I haven't observed any failures, but I have observed warnings in 0.3.2.9 when we apply this patch to master in a mixed network.
That really is a bug in 0.3.1 and 0.3.2, because Mike's 0.3.1 fix is identifying clients incorrectly.
There's nothing we can do about these warnings, except backport the fix.

We could skip the 0.3.0 backport, I think, since 0.3.0 is EOL at the end of this month. But we should do an 0.2.9 backport if we think this is important: 0.2.9 is supported till 2020.

It does appear to me that if we use the handshake patch from #21406 that we're going to be much better about identifying well-behaved clients correctly, and this should be better for the netflow code too (and not break it). I am a little worried about misbehaving clients - for those it feels smarter to check the digest in cases where that matters for DoS. But that should be a separate bug.

I have made a bug24898-029 branch, designed for inclusion in 029 and 030. It will conflict with 031 because stuff is already fixed there, so you'll want to do the right version of 'merge -ours' after 030.

We will want this branch merged when we put the #24902 DoS patch into 029.

Do we still use CREATE_FAST to mark clients in 029? Or did we backport that fix as well?

We still do it. That is, anybody who sends you a create-fast cell when you're running 029 or 030 now gets counted as a client, meaning the DoS defenses will be triggered against them (once #24902 gets backported), and meaning when we consider whether a channel can satisfy an extend request, this one can't.

Why, are we going to regret that? :) See also the discussions in #22805.

Do we still use CREATE_FAST to mark clients in 029? Or did we backport that fix as well?

We still do it. That is, anybody who sends you a create-fast cell when you're running 029 or 030 now gets counted as a client, meaning the DoS defenses will be triggered against them (once #24902 gets backported), and meaning when we consider whether a channel can satisfy an extend request, this one can't.

Why, are we going to regret that? :) See also the discussions in #22805.

I think we are going to regret that, and we should backport the parts of #22805 that remove the CREATE_FAST mark client.

We should also take them out on general principles, because it reduces the number of connections between relays.

Ok, I added a second commit to my bug24898-029 branch that backports that part from #22805. I think we are good to go now.

To be clear, that means commit 4bd208b wants to make it into 0.2.9 and 0.3.0, and its changes are already present in 0.3.1. Whereas commit 2076db4 wants to make it into 0.2.9 and 0.3.0 and 0.3.1, and its changes are already present in 0.3.2.

Sorry in advance for the mess Nick. I would just merge it all myself but I think I would screw up the "ours" aspects of the conflict handling. :)

Ok, I added a second commit to my bug24898-029 branch that backports that part from #22805. I think we are good to go now.

To be clear, that means commit 4bd208b wants to make it into 0.2.9 and 0.3.0, and its changes are already present in 0.3.1. Whereas commit 2076db4 wants to make it into 0.2.9 and 0.3.0 and 0.3.1, and its changes are already present in 0.3.2.

Sorry in advance for the mess Nick. I would just merge it all myself but I think I would screw up the "ours" aspects of the conflict handling. :)

Yes, we must backport 4bd208b and 2076db4 to 0.2.9, and backport 2076db4 to 0.3.1, so that the DoS backport identifies clients correctly.