July 17, 2003

Eliminating Spam

Steven Green points to this John Dvorak article, and asks: "can some of you smart network-type people tell me if any of these ideas [for reducing spam technologically] are doable without needing an entirely new email system and software"?

I'm not going to go into detail on all of Dvorak's proposals, but I will say that he misses the point, for an entirely good reason. The Internet was built as a survivable network, meant in fact to survive nuclear strikes on some of its nodes and still keep going. It was also designed to assume security: all of the users would have to be authenticated by a node before obtaining access to the network, and all of the nodes were trusted. This was an excellent architecture in 1988 (when I first got onto the Internet), because all of the users were accountable for their actions - to their employer, or their university, or their government agency, etc. As a result, abuse was a minor problem (mostly incoming freshmen at colleges) and easily dealt with. But the system dealt with abuse - the human part of the system, not the network part. You could actually be shamed off of the Internet back then. Removing the human enforcer of netiquette and good practice was like giving the government the power to raise almost unlimited revenue from a very small proportion of the population: corruption and abuse exploded. Open source routing - where everyone passed traffic by the shortest route, so that my traffic could go right through IBM's internal network if that was the fastest way to the destination - was the first to go, replaced by a few backbone networks, with the multiply-connected large private networks protected by layers of firewalls and large IT staffs. (This increased the brittleness of the network at the same time that it increased security.)

So how do we fix this? We cannot build a safe, secure and reliable system on top of an inherently insecure, self-regulating and untrustworthy network. We would have to build a new network from the ground up. And to do that, we'd have to scrap everything except the physical layer - the NICs, wires, routers, bridges, modems and so forth would be all that's left. The intelligence of the network would have to change, some of it drastically.

The first problem to solve is design: how do you create a secure, trusted network, which at the same time still allows connections from anywhere to anywhere, using any protocol, as the default? How do you do this without either excluding people or forcing a central controlling agency (brittle, arbitrary and powerhungry as any bureaucracy) or limiting the ability of people to use the network in reasonable ways? How do you manage traffic in such a way that the network can be flexible, without overwhelming small companies who are multiply-connected by passing external traffic through their networks? How do you provide authentication and authorization, in other words, to a global network using only local resources?

It turns out that if you are willing to start with a blank page, it's not that difficult. The major issue to be solved is trust: how do I know whom I'm talking to? Since you don't want a brittle network, that will fall apart when something happens to a small number of nodes, the only trust model that works is to authenticate yourself to some local authentication source. For example, an ISP or a company would provide a directory which lists all of their users, and contains the information necessary to trust that user. Each node, then, has to trust its neighboring nodes. (This is already the case today, in that you cannot establish a physical connection to another node without being their customer or their provider, or entering into an agreement to do so.) Each node would need to cryptologically authenticate itself to its neighbors, and vice versa, and each user would have to authenticate themselves to their node. No node would pass traffic that did not include its partner node's connection key in the message portion of the packet, and no node would alter that information.

On the good side, this lets any traffic be traced back to its source. You cannot fake traffic as being from a node that you are not from, because the network will not pass on the message unless it has authenticated the upstream source - that is, you - and your information is in the packet header. Therefore, you could put preceding node information in the header, but it would be bogus and the trace would effectively stop being verifiable at you, the sender. On the bad side, this would dramatically increase the amount of traffic on the Internet (by increasing the size of any given packet) and would slow down all traffic, bandwidth being equal, because of the overhead of nodes authenticating to each other and the larger byte-count of a given data set.

Let's say, though, that we were to use that or some similar measure of trust to guarantee that the network was trusted. Now, we still have a whole host of problems to solve, because the IP spec would have to be rewritten at a pretty fundamental level. This means that the NICs would need updated firmware, or for cheap cards that have their firmware burned into the hardware, the whole NIC would have to be replaced. Then, you would have to rewrite the network stack to take account of the new protocol. You'd have to rewrite all of the protocols like UDP, FTP, SMTP, POP, NNTP, LDAP, SSH and the like. Some of these would be huge efforts, while others would need few or no changes. The OS network stacks would have to be rewritten for each OS, and some applications would need to be rewritten as well, if they deal with the network information at a low level.

All in all, it would be a huge load of work, and not likely to get done as an organized effort. The better way to do it would be to set up such a network privately, amongst friends as it were, and expand that to their friends, and their friends, and so on, and so on... And if you were to gateway to the global internet, you'd lose a lot of the benefits right there.