Original post by John SchultzFor a large MMOG, unless the custom UDP protocol has excellent bandwidth adaption, TCP may be a better choice for keeping data flowing smoothly. This is one reason why custom UDP protocol authors should study TCP very carefully during the design phase (see my previous posts for links to excellent TCP/design/implementation papers).

I complete agree with understanding how TCP works and why before implementing your own UDP or even IP based protocol (I'm liking SCTP more & more), but using TCP does not solve the problem, it exuberates it!

Think through what happens when there's not enough bandwidth, and all your TCP streams are now being throttled back. Now you are producing data and dumping it into many local TCP socket buffers faster than it is being pushed out over the network... So now you have to write code to throttle the data production even if you are using TCP, just like you would with UDP (or your server will either crash or very slowly catch back up – perhaps never catch back up).

If you're new to networking, learn how TCP & UDP work first. If you're not new, do it right the first cut. I've never regretted doing that. You don't have to 100% implement the UDP solution before you move on, but start by using UDP and get that framework in place. Implementing a reliable UDP protocol isn't terribly difficult. You just need sequence stamp your packets and keep the reliable ones in a circular queue (or hash-table) along with some book-keeping that allows you to send ACK’d packets. NAK’d packets can come latter, if you need them at all.

Also, in this context of MMOG, the reliability provided by TCP usually is not good enough. You need guarantees at the application logic (game transaction) level that the packet/information/request was not only received, but also processed and successfully completed. You have to write this code whether you use TCP or UDP (or SCTP). (Consider an item trade or sale.)

The only other thing you need with UDP is another hash table to get at the connection data (or connection tracking object) for a particular client, and that can be made quite transparent keying off the object id everything in a MMOG is going to have.

For this purpose using TCP only provides you with the illusion of less work. It’s a lot of work whether or not you use TCP. TCP is for local stream emulation (e.g. a terminal or a file copy), and it's a logical & architectural mistake to use it for something else.

$0.02

- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara

Original post by John SchultzFor a large MMOG, unless the custom UDP protocol has excellent bandwidth adaption, TCP may be a better choice for keeping data flowing smoothly. This is one reason why custom UDP protocol authors should study TCP very carefully during the design phase (see my previous posts for links to excellent TCP/design/implementation papers).

I complete agree with understanding how TCP works and why before implementing your own UDP or even IP based protocol (I'm liking SCTP more & more), but using TCP does not solve the problem, it exuberates it!

Think through what happens when there's not enough bandwidth, and all your TCP streams are now being throttled back. Now you are producing data and dumping it into many local TCP socket buffers faster than it is being pushed out over the network... So now you have to write code to throttle the data production even if you are using TCP, just like you would with UDP (or your server will either crash or very slowly catch back up – perhaps never catch back up).

For *nix, SIOCOUTQ/TIOCOUTQ/tcp_ecn should enable the TCP sender to monitor the output queue for throttling.

For some games, the only option will be to drop the slow client channel(s). For other games, the client will have to disconnect/reconnect (buffer flush if supported) and reset to the server state.

When a slow client channel is detected, UDP (non-state critical such as movement/voice data) should be cut until the channel can catch up. I call this "reliable recovery mode" in my custom UDP protocol.

For the "distributed simulation" view of networked gaming, the main problem with TCP is that it will re-send old data if there's a packet lost; meanwhile, it will withhold newer data from being delievered.

SCTP can solve one of those problems (withholding newer data), but only a custom UDP protocol lets you re-send newer data when you detect that you need a re-transmit. Also, SCTP doesn't work so well through most NAT boxes :-(

TCP is a reliable protocol. Although this sounds like I'm paying it a compliment, it can be a bad thing for games.

Reliable means that the packets you send are received in the same order. If you send packets:

X -> Y -> Z they are received as X -> Y -> Z

This is great for chat/communication, but horrible for just about anything else.In UDP packets are received in the order that they arrive. Which is great for movement, or actions, because the last action is less important than the current action. You can mitigate the problems with out of order sequences on UDP very easily, simply by queuing the messages as they arrive, and then re-requesting those that didn't make it. Once your packets are ordered, then you can process them.

Unfortunately there is no mitigation for the reliability of TCP. If a packet is dropped, then no new packets will be accepted until the missing packet arrives. This can cause a great amount of perceived lag. Have you ever played a game where you are moving your sprite, and then all of a sudden you stop moving, so you keep hitting the forward button. Then, all of the sudden your sprite bursts into a fast-forward sprint. This is because the movement packets were sent with TCP.

I would strongly suggest that you use a network API that mitigates these problems for you. Most of them do. I see no feasible reason to use TCP, except to be lazy.

Original post by Anonymous PosterTCP is a reliable protocol. Although this sounds like I'm paying it a compliment, it can be a bad thing for games.

Reliable means that the packets you send are received in the same order. If you send packets:

X -> Y -> Z they are received as X -> Y -> Z

This is great for chat/communication, but horrible for just about anything else.In UDP packets are received in the order that they arrive. Which is great for movement, or actions, because the last action is less important than the current action. You can mitigate the problems with out of order sequences on UDP very easily, simply by queuing the messages as they arrive, and then re-requesting those that didn't make it. Once your packets are ordered, then you can process them.

Unfortunately there is no mitigation for the reliability of TCP. If a packet is dropped, then no new packets will be accepted until the missing packet arrives. This can cause a great amount of perceived lag. Have you ever played a game where you are moving your sprite, and then all of a sudden you stop moving, so you keep hitting the forward button. Then, all of the sudden your sprite bursts into a fast-forward sprint. This is because the movement packets were sent with TCP.

I would strongly suggest that you use a network API that mitigates these problems for you. Most of them do. I see no feasible reason to use TCP, except to be lazy.

Good luck.

The above issues apply to all reliable protocols, including those built into custom reliable-UDP protocols. Such issues are network game design issues, as opposed to network protocol issues. The moving sprite example problem can be solved with local client simulation and server reconciliation.

UDP+TCP (with throttling) is a reasonable choice for a first time network programmer working on a game project. A custom reliable-UDP protocol can only be more efficient (it can't be more reliable (a boolean trait)). Packet ordering and reliable system design are relatively straightforward. Creating a bandwidth efficient system in addition to the reliable system is not trivial.

TCP is an excellent protocol. The advantage to a custom UDP system is the ability to optimize performance and to have total control of system behavior. TCP could have been more efficient for game/simulation applications if more TCP network options were available and/or more options were standardized across platforms.

Again, a good reliable-UDP protocol will emulate TCP very closely, only differing where more control and efficiency can be gained for game/simulation applications.

UDP's biggest strength for a shipping product may be the firewall issue (worst case, only 1 (or a few) fixed ports must be opened). This is less of an issue for a MMOG with a centralized server (never hosted by other players), as fixed TCP ports can be used (including port 80 (HTTP): it will always work (unless HTTP is specifically blocked)).

Original post by Anonymous PosterIn UDP packets are received in the order that they arrive.

This isn't true. UDP can send packets out of order, and in fact can send duplicates and other nasty things as well.

BTW, this thread is really making me want to fire up my IDE and pick up where i left off with FTA... if I could even remember where that was [grin]. I was just in the middle of switching movement over to use the keyboard... unfortunately real life is taking over lately and I don't have time for that stuff right now... deadline this tuesday [grin].

Original post by SarumanI should have stated the other alternative that we were looking at and is still possible. We were looking at using RakNet as a viable networking solution for our project although there are some major issues with handling a large number of players at the current time. We have talked to Kevin Jenkins about this and know exactly where the problems are and will be looking into this if we do select it as our approach. Not to say RakNet is a bad library, I actually find it very good.. it just needs some work for having a large number of players online.

Currently this is something we are looking at to solve our approach of having to use both TCP and UDP which can cause some trouble.. but we will either be using that approach or a full RakNet approach although we will have to modify the library to get it working for a large number of players.

Do you think you could go more into details about this? Ive looked through a lot of the RakNet source, and it seems very elegant... although why he made all his own data structures is beyond me... What's wrong with RakNet right now that doesn't allow large amounts of players? And could you define 'large'?

The above issues apply to all reliable protocols, including those built into custom reliable-UDP protocols.

Not so. If you build a state-reliable protocol (rather than a stream-reliable protocol), then you can deliver things out of order just fine, because the promise of a state reliable protocol is that the mirrored state will eventually be consistent with the sent state. For a stream-reliable protocol (such as TCP), the promise is that the *data* sent is the same on the receiving end as the sending end, which is a much stronger guarantee (and unnecessarily strong for networked games).

Also, there are classes of reliability such as "integrity and sequencing but not lossless" which are quite useful to games. That semantic is typically available in a "reliable" UDP wrapper, but can't be provided by TCP.

The above issues apply to all reliable protocols, including those built into custom reliable-UDP protocols.

Not so. If you build a state-reliable protocol (rather than a stream-reliable protocol), then you can deliver things out of order just fine, because the promise of a state reliable protocol is that the mirrored state will eventually be consistent with the sent state. For a stream-reliable protocol (such as TCP), the promise is that the *data* sent is the same on the receiving end as the sending end, which is a much stronger guarantee (and unnecessarily strong for networked games).

Also, there are classes of reliability such as "integrity and sequencing but not lossless" which are quite useful to games. That semantic is typically available in a "reliable" UDP wrapper, but can't be provided by TCP.

The above quoted statement, taken in its original context, referred to the fact that the issue was a game design issue, and not a reliable protocol issue. In the case of the example problem under discussion, player movement keystrokes would have to be processed in order, otherwise the state would rapidly diverge.

On the topic of out-of-order reliable support (OOORS): while I agree with some of your points regarding levels of reliability, in practice I haven't seen any evidence that a custom reliable UDP protocol that supports OOOR packets has a signficant advantage over TCP, provided everything else is equal (both have similar behavior to bandwidth adaption, etc.). While all packets should be CRC'd (integrity), and sequencing support (ordered data) is indeed useful for movement data, I would not classify these concepts as reliable by themselves. CRC and sequence support can be applied to straight UDP (e.g. movement data) alongside TCP for reliable messages (critical state only).

I can see the benefit of having a power-up/game-item-activation happening with immediate feedback being considered a benefit of OOORS, but given the infrequent occurance of OOO packets, I don't see OOORS being a strong argument (by itself) for a custom reliable UDP protocol over TCP. One argument for OOORS over TCP would be a reliable signaling packet, so that slow clients can be instructed to perform special actions (such as reset to a known state so that the server can reset the client back to a (last) known synchronized state, allowing the server to flush/reset it's send queue(s)). While TCP provides for out-of-band data support (OOB), it is not supported on all TCP implementations. However, a second TCP critical message channel (that takes near zero bandwidth) can provide equivalent behavior.

In my network simulations and internet tests, I did not see a major performance issue with out of order packets. Additionally, given that certain (major) encrypted and authenticated protocols drop all out of order packets, there was no point in implementing a protocol that has to worry about out of order packets (in one particular project, a lower level network layer dropped all out of order packets).

While I'm defending TCP, I also encourage developers to explore creating custom reliable UDP protocols, tailored to their needs. However, if the custom UDP protocol provides no real benefit over TCP in practice (or performs worse than TCP, on average (from low- to high- congestion situations, i.e. does not scale well), there's no point in reinventing the wheel, especially if the new wheel ends up being square.

The problem is that, if I send the state for object A over TCP at time T, and there's packet loss and re-transmission, the TCP stack will send another copy of A at T across the wire, even though A has since evolved to time T+dt. With a custom protocol, you can at that point send the state of A at T+dt, so that re-transmits are more efficient.

Similarly, for a single object it may make sense to deliver messages in order, but trying to enforce that across objects will lead to the entire world hitching when a single packet is dropped (not just the objects who had updates in the packet). You may think that a separate TCP connection per object would avoid this, but then instead you have the full packet overhead of TCP per packet per object, which is a rather high price to pay -- you can do much better yourself over UDP.

From my personal experience, these are two of the three major reasons to use UDP. The third reason is that UDP can do peer-to-peer punch-through through NAT. This recommendation does, however, assume that you know what you're doing... not only with congestion avoidance, but with things like security and authentication as well!

Original post by hplus0603The problem is that, if I send the state for object A over TCP at time T, and there's packet loss and re-transmission, the TCP stack will send another copy of A at T across the wire, even though A has since evolved to time T+dt. With a custom protocol, you can at that point send the state of A at T+dt, so that re-transmits are more efficient.

Right: send such data using UDP in the unreliable channel. Stale data is thus never retransmitted. Only transmit data that absolutely must arrive via the TCP channel.

Quote:

Original post by hplus0603Similarly, for a single object it may make sense to deliver messages in order, but trying to enforce that across objects will lead to the entire world hitching when a single packet is dropped (not just the objects who had updates in the packet). You may think that a separate TCP connection per object would avoid this, but then instead you have the full packet overhead of TCP per packet per object, which is a rather high price to pay -- you can do much better yourself over UDP.

That is a good argument for OOORS for UDP when the simulation requires objects to be updated in such a manner. However, the game state management can be designed around the player's view space, for example, and there really shouldn't be a big difference in object update order (in terms of what the player will experience). The point is that there should never be a huge backlog of queued data that a "hold up / ordering" case causes a diminished player experience. This can be mathematically modeled to a limit case based on network bandwidth and maximum allowed latency for game/simulation effects. Put another way, if the player is experiencing delayed effects due to a "hold up / ordering" issue in the reliable channel, the data rate is too high.

Quote:

Original post by hplus0603From my personal experience, these are two of the three major reasons to use UDP. The third reason is that UDP can do peer-to-peer punch-through through NAT. This recommendation does, however, assume that you know what you're doing... not only with congestion avoidance, but with things like security and authentication as well!

I agree with the P2P NAT point, and as stated previously, I use UDP exclusively for my current game/simulation projects. However, I can understand why some MMOG developers chose TCP when the game/simulation resides on a dedicated server. In such a case, all I/O can go through port 80 (HTTP) or even 443 (HTTPS). Thus, there will never be issues with NAT/firewalls (unless HTTP/HTTPS is blocked, as in some corporate environments where web access is not allowed).

Original post by graveyard fillaDo you think you could go more into details about this? Ive looked through a lot of the RakNet source, and it seems very elegant... although why he made all his own data structures is beyond me... What's wrong with RakNet right now that doesn't allow large amounts of players? And could you define 'large'?

Sure thing.

By large I mean that anything over 200 players you are going to start having some major issues, maybe even with a lower amount of connected players.

The main bottleneck in the RakNet API is memory usage for tracking duplicate packets. In ReliabilityLayer.h you will see a giant arry and this is a problem space that needs to be solved as for any large (>100) number of connected clients you are going to have issues. I know Kevin has worked on this but I do not know where he has gotten or what design he chose, and I am pretty sure he does not want to commit until the doxygen and osx port are complete as it would set back other peoples work.

There are other minor issues that really should be cleaned up, and IOCP support is something that you would definately want back in if you are running on a Windows platform server.

Hope that helps.

EDIT:

Also John Schultz I see you mentioning using TCP on specific ports such as HTTP and SSL (80/443). You should be able to use any port without worries as the client is the one opening the TCP connection.

Original post by SarumanAlso John Schultz I see you mentioning using TCP on specific ports such as HTTP and SSL (80/443). You should be able to use any port without worries as the client is the one opening the TCP connection.

That's true, however some environments block everything except HTTP/HTTPS. For example, some users won't be willing to ask the IT department to open ports (security policy may prevent additional port openings).

Quote:

Original post by SarumanThere are other minor issues that really should be cleaned up, and IOCP support is something that you would definately want back in if you are running on a Windows platform server.

Does RakNet use TCP? If not, how do you see IOCP helping a UDP-only based protocol, especially if the server is single-threaded (for maximum performance due to zero (user-level, network) context switching)?

Once the completion port has been created and sockets have been associated with it, one or more threads are needed to process the completion notifications. Each thread will sit in a loop that calls GetQueuedCompletionStatus each time through and returns completion notifications.

It would appear that thread context switching overhead might outweigh kernel (paging) advantages with IOCP, especially given the nature of UDP (not using memory/queues as with TCP).

I could see that as well and would have to investigate the matter further. RakNet is a UDP library and at one point did have IOCP support built into it. Kevin seemed to insist that IOCP was a good thing to have which is why I mentioned it, as I have only used IOCP with TCP based solutions in the past.

Original post by SarumanI could see that as well and would have to investigate the matter further. RakNet is a UDP library and at one point did have IOCP support built into it. Kevin seemed to insist that IOCP was a good thing to have which is why I mentioned it, as I have only used IOCP with TCP based solutions in the past.

I performed some quick benchmarks with overlapped I/O and UDP, and observed no improvement over simple sockets. I also recall hplus0603 mentioning benchmarking multithreaded UDP (even with MP hardware) and they found single-threaded to be faster. I have not tested overlapped I/O with IOCP, but I suspect that it may be slower than simple sockets for a single-threaded UDP design. Perhaps that's why RakNet dropped IOCP? Benchmark results would be helpful.

Original post by John SchultzI performed some quick benchmarks with overlapped I/O and UDP, and observed no improvement over simple sockets. I also recall hplus0603 mentioning benchmarking multithreaded UDP (even with MP hardware) and they found single-threaded to be faster. I have not tested overlapped I/O with IOCP, but I suspect that it may be slower than simple sockets for a single-threaded UDP design. Perhaps that's why RakNet dropped IOCP? Benchmark results would be helpful.

The main reason for IOCP being missing was that the feature was broken in a recent version and it just hasn't made its way back into the API as of yet. Also I totally believe that a single-threaded server will have the best performance as you don't get any threads mashing heads.

As I mentioned though this is not the most major bottleneck of RakNet for a large amount of users, the Reliability Layer is the critical design flaw that would need to be adjusted.

As a general rule the commercial MMO that have had problems with network latency, connection issues and limitations for the number of players being able to play at the same time used TCP. The MMO that have had less problems use UDP. For example compare the problems with WoW that uses TCP with the lack of problems with Everquest that uses UDP.This is because UDP allows the programmer better control over what data to send and how to send it. Custom UDP protocols are able to compensate for poor network conditions because the application can choose to drop some data to allow the most vital data to get through.However there has to be a choice for programmers and that is why I programmed ReplicaNet to offer the choice of UDP or TCP for connections. The network transport layer can also be expanded by the user to take advantage of other forms of network communication. For example at least one of the consoles requires socket connections to be authenticated by their online service, so this can be incorporated into a modified transport type.

Original post by Martin PiperAs a general rule the commercial MMO that have had problems with network latency, connection issues and limitations for the number of players being able to play at the same time used TCP. The MMO that have had less problems use UDP. For example compare the problems with WoW that uses TCP with the lack of problems with Everquest that uses UDP.

Although I agree with what you said about UDP allowing greater control, etc I do not agree with this statement in any way. I can point out 3 MMO games right now that are built on a UDP architecture and are ten times more horrendous latency-wise than any of the TCP counterparts, look how fraught with problems Asherons Call 1&2 have been over their entire lifespan (still broken to this day). Everquest actually had much worse connectivity problems on launch than WoW did, and EQ is even a TCP/UDP hybrid.

I am yet to believe that it is the protocol that causes issues over the actual netcode and implementation of the server.

Everquest actually had much worse connectivity problems on launch than WoW did, and EQ is even a TCP/UDP hybrid.

EQ Only uses UDP for game communication. This can be verified using numerous methods such as capturing the packets or injecting your own code into the winsock dll and logging the socket calls. EQ1 was also much better than WoW at launch in terms of network performance, I know because I was playing both from the start and played WoW from the alpha. :) Everquest2, which uses UDP for game data, also has much better network performance than WoW.

I've also not noticed Asheron's Call problems that are related to them using a specific network protocol. The game might be rubbish, but that is not a network protocol related problem.