After launching simutrans, I clicked "Play Online" button to join a network game and simutrans did not respond anymore. No error message can be seen either on GUI screen or console, even with gdb debug tool.I tested on pak128 and pak.nippon. This freeze was reproduced on the both pakset, so this problem seems to be pakset independent.This bug was found in simutrans nightly r8423 on macOS.

Simutrans will freeze until the TCP connection attempt times out. This varies from 10 to 30 seconds depending on the network stack.

Unfortunatly there is no programming solution to this. As far as I am aware Berkly sockets API does not support asynchronous connections, which would allow connection attempts to not block the main thread. Additionally multi thread support is still optional in Simutrans standard so the game must be designed around supporting only 1 single thread, the main thread, so one cannot hack together an asynchronous connections API using worker threads. On top of this much of the network code relies on global state, which is not really thread safe or potentially limits the number of parallel network based operations that can occur at a given time.

Non-blocking sockets is a possibility, although Simutrans is in general not designed for asynchronous operations. The multi-threading that has been done is fork-join-style, which is something different from this.

The current situation can be improved, e.g., the waiting for a server to respond can be done in a separate thread, without blocking the gui.

Not without creating an entirely different code path for those builds without pthread, turning the already nightmarish network code even more so. This is why I keep saying pthread should be made a requirement rather than an option.

The current situation can be improved, e.g., the waiting for a server to respond can be done in a separate thread, without blocking the gui.

The problem is not easily solved by just slapping on a thread. The main part of the problem is that you suddenly have something else to tidy up. If the user clicks connect, and the program asynchronously starts a connect operation, one suddenly has to deal with the problem that the used may close the dialog before the connect succeeds and that the user may repeatedly click connect while a connect is underway. Blocking the event thread is an easy, but also very inelegant, way to solve the former, and maybe also the latter, problem.

With a background thread, the background thread has to wait for one of two things: connection completion (successful or not) and cancellation message from event thread. I know a single function for this in Win32, but I don't know enough about pthreads to know if it can do the same. The server will also likely experience more spurious connects, caused by users clicking cancel after the server has acknowledged the connect, but before the client received confirmation. It should be hardened for such things already for other reasons, but I don't know.

The said, I also do not know how easy it is for the main event thread to periodically poll the status of a non-blocking connect, but it is a possibility.

Not without creating an entirely different code path for those builds without pthread, turning the already nightmarish network code even more so. This is why I keep saying pthread should be made a requirement rather than an option.

With mingw64 configured with the posix thread model (as is the case with MSYS2), the standard C++ library forces a dependency on pthreads already. (I've posted a patch to force this dependency to be statically linked, since almost everything else is so by default.) My complaints over threading has not so much been over a dependency on pthreads, as using threads in the most difficult way possible (concurrent operation on shared state), without any benefits over the old tried-and-tested single-threaded code. In this case, the threading would hopefully be in a part of the code I don't use anyway.

Threading is not default because 1) the server does not need it (and may even run o a single virtual core) and 2) some things are hard to debug in multithreading. So it is nice to be able to disable threading.

The problem with a crashes list server is more subtle. There server is there, it just does not answer. Even google chrome waits for 30s before giving an error. So simutrans is in good company there. One could of course show a waiting bar with a 30s countdown. It that reaches zero, then one could kill the nonblocking thread waiting for the server and give up.

Even google chrome waits for 30s before giving an error. So simutrans is in good company there. One could of course show a waiting bar with a 30s countdown. It that reaches zero, then one could kill the nonblocking thread waiting for the server and give up.

I like that idea, cancel the frustrating wait, to continue and enter a server manually.

The problem with a crashes list server is more subtle. There server is there, it just does not answer. Even google chrome waits for 30s before giving an error. So simutrans is in good company there. One could of course show a waiting bar with a 30s countdown. It that reaches zero, then one could kill the nonblocking thread waiting for the server and give up.

Not entirely sure what you mean. A thread cannot both be non-blocking and waiting for a server. One also should avoid killing threads, especially if resource management if involved.

Quote

Threading is not default because 1) the server does not need it (and may even run o a single virtual core)

The server would benefit from multi threading because it has to communicate through a network. Although there is some buffering with network calls to minimize it, there is also a chance that network transmission calls blocks for some length of time which results in the server wasting available computational time for the game. This might be the case for a very laggy client that retransmission causes just the send buffer to them to overflow.

This would also open up the ability to accelerate multiplayer joining time because one could transfer saves in parallel to advancing the game state allowing people to join and catch up, like OpenTTD, rather than currently where everyone must remain paused while the other client joins.

Not entirely sure what you mean. A thread cannot both be non-blocking and waiting for a server. One also should avoid killing threads, especially if resource management if involved.

I was wondering about the same things, but for the first part, I assume he meant that the thread is using select or poll. As for killing threads, yes that is bad, but pthread does not appear to have an way of doing it. (Java is also removing its "support" for doing so, after having it marked as deprecated for probably twenty years or even more.) The only sad thing is that I can't seem to find any way of waiting on a socket and an event at the same time in a platform independent way (or rather, for anything but Windows).

The server would benefit from multi threading because it has to communicate through a network. Although there is some buffering with network calls to minimize it, there is also a chance that network transmission calls blocks for some length of time which results in the server wasting available computational time for the game. This might be the case for a very laggy client that retransmission causes just the send buffer to them to overflow.

This would also open up the ability to accelerate multiplayer joining time because one could transfer saves in parallel to advancing the game state allowing people to join and catch up, like OpenTTD, rather than currently where everyone must remain paused while the other client joins.

Non-blocking sockets would perhaps be enough for the first case. Using threads to do blocking I/O is as far as I understand it a bit old fashioned. If the non-blocking call fails because the buffer is full during playing, the connection is perhaps too slow to continue anyway. Non-blocking sockets and a single background thread is perhaps the modern way of doing bulk transfers.

The server does not benefit from threading, since it has to transfer the games state at load. At that time the game is frozen for all clients. Hence rather the game is transferred as quickly as possible. SO no benefit from threading. The alive messages to the list server should fit in one packet, so also there I see no advantage of threading. (Especially since a server with slow upload is useless).

The server does not benefit from threading, since it has to transfer the games state at load.

I think Dr Supergood was suggesting an improvement whereby the savegame is transferred to the new client while it continues to run on the existing clients; once the new client has downloaded the save it would effectively run the game on fast forward until it catches up.

I think an idealised timeline would look something like the following:

Server

Existing Clients

New Client

Requests to join game

Saves game

Saves game (at same point)

Loads from save, and begins sending it to new client

Loads from save

Begins receiving save

Resume game, not yet synchronised with new client

Resume game

Finishes receiving save, and loads the game

Runs the game as fast as possible (without graphics?) to catch up

Now running synchronously with new client

Catches up with server, and begins running normally

This might lead to a significant reduction in waiting time for existing clients, at the expense of a longer joining time for new clients, and much increased code complexity. Having not yet played any network games myself (ignoring some ultimately failed attempts about 7 years ago), I have no idea if this would actually be a useful thing to have.

That does not sound quite like this reported issue, in which case you would just have to wait a little while.

Perpetual hangs does rather sound like an unstable network connection (if it is indeed network related), in which the connection has been established, some data has been sent successfully, but then a reply from the other end goes missing. With sockets, a receive operation will by default not time out, because the lack of incoming data might simply be that there isn't any. The transport layer has no way of knowing if that is to be excepted or not, unless explicitly told how it is with setsockopt. The sending side however will notice something is wrong, because it doesn't get acks, and likely terminate the connection at its end, but any termination signal might get lost as well. We had a lot of trouble with this at work when the network infrastructure started dropping lots of packages after a certain amount of uptime/use, until some equipment finally was replaced.