Historical:IRC Meshing

Historical Material - Information posted here may be inaccurate as a result of being obsolete. This information is kept for historical reference purposes.

Introduction

Meshing describes a different method of linking servers together than typically used on IRC, and different from how IRC was created and specified.

In a typical IRC network, linking is performed as a spanning tree, where each server may only be connected once. This is (of course) vulnerable to network interruptions, in which the network is split into two halves (a "netsplit").

Meshing is used in many other technologies to avoid similar problems, and many have attempted to apply this to IRC to reduce the fallout from netsplits.

While being a noble aim, this is something which we do not believe is possible. This page outlines why.

Types of Mesh

Full Mesh

Description

A full mesh is where every server in a network connects to every other server, and all connections are actively used to send and recieve data.

This is the model most frequently proposed by people wanting to "mesh IRC".

Why it doesn't work

Routing

There is no reliable way of knowing whether a protocol message got from point A to point B in a network with no set routing pathway.

Solution, and why it doesn't work

Serial numbers to each message.

It doesn't work - because what happens if you re-send your messages with your unique codes - and they again don't get through? Yet again, you don't know whether or not they got through. Worst still, perhaps they did the first time, and you just introduced a bunch of duplicates. Hooray!

Message ordering

IRC messages are *very* much dependant on order both for reasons of looking sane and working reasonably. An example of this would be recieving the QUIT for a client down one pathway from server A to server B before actually recieving it's introduction - you're quitting a user that doesn't yet exist, and then proceeding to introduce a ghost user that will probably only exist on half the network, and probably never go away.

Solution, and why it doesn't work

Message acknowledgement.

It doesn't work - because ...well, perhaps it might even work.

...But the code for it would be horrific to write, error prone, scary, and would lag the whole of IRC out ridiculously while having to wait for a response on every action before taking the next action.

Also, for stability it would need to be a three-way handshake in the same manner as tcp connect(), e.g. "message send, message ack, ack the ack", to ensure that both ends received both parts of the sequence.

This would also then require the neccessary retries, timeouts and backoff periods to make it actually reliable enough for use (see the documentation of TCP SYN packets for information).

Really, this is NOT something you want to consider.

Conclusion

We're not saying this is utterly impossible as such, as just incredibly totally messy and overcomplicated. So much so that nobody really would or should want to go this path for the miniscule benefit it provides

Redundant Links

Description

A redundant links mesh is where each server is connected to the network multiple times, by means of dormant connections that are established and maintained while the server is linked. They are not used actively. One link ONLY is used to send and recieve most traffic. When a problem with that link is detected, another connection is chosen and used.

This is the model most frequently proposed by people after they realise how hard/impossible full mesh is.

Why it doesn't work

Guarenteed Delivery

You cannot be sure that you haven't missed anything while renegotiating uplinks. To get around this, it would be necessary to split and re-burst internally - therefore this removes all the real gain of doing this anyway.

Also, this still doesn't remove the risk of dropped messages, since query type messages (remote WHOIS, PRIVMSG, et al) are not sent in burst, meaning they'd just disappear with no error message - dangerous, especially in the case of PRIVMSG, where no error reply means success.

Conclusion

Probably implementable. But there's not a big gain, so, why?

Solutions

Failover Links, Autoconnect

To help minimise the duration of a netsplit, it is advised to set up autoconnect on server to server links, and set failover links on all connections. Both of these options are currently supported by InspIRCd.

More Reliable Network

Don't use $1 shells or dialup links to host your servers. Their network connectivity sucks. Yes, really. If you have a decent hosting provider, your network will not split as often.

Further Discussion

If you've got something you'd like to add, talk to us. But, please do keep in mind that we have been thinking and working on this for a very, very long time - so we've heard most ideas (which is what this page is now for).

If you're interested on working on meshing idea xyz, great! Start work on a new protocol module. They are modular. See m_spanningtree for our (default) implementation of one. Come talk to us about it, too. We'd like to hear about it.