I am still seeing slower-than-expected results with the sample code and would like more feedback if any.

Arrgh. I think I've just encountered the same problem in a different server. I've got a semi-production server (which for a month has been running a game OK) which has been fine with three Selectors. Performance has been good.

I just added a new processing module, and suddenly the performance on the Write Selector has gone through the floor and is busily tunnelling it's way to Australia.

First response takes 3 seconds (!) to send (it's only 63k, on a LAN!). If the request is repeated, the response takes 60 seconds (!) sending almost exactly the same data. If you try a third time, linux crashes - or sometimes, you get a lot of disk thrasing, linux kills the JVM process, doesn't crash [phew!] and you are left with 12 zombie JVM processes sharing 200 Mb of RAM - EACH process has to be force killed (signal 9) individually; note: normally in linux, killing one JVM process kills the lot, because the other processes are just the different threads within the JVM.

Normally, typical responses of the order of a few kb are transferred at LAN speeds (multiple Mbits, i.e. you'd expect perhaps 10 milliseconds for the transfer, not multiple seconds).

Profiling shows nothing - no CPU hogs, no memory hogs. This seems therfore to be a problem with the JDK 1.4.2 implementation of Selector's and/or ByteBuffer's (this is the only way I can think of that RAM usage and/or CPU usage are taking place but are invisible within the JVM?).

Interestingly, none of my base code has changed - all my processing is abstracted away from the Selector's (I just add a new module that returns String's to the lower-layer which sends them etc), so the only thing that is different between when I had no problems and now is the length of the response and the actual bytes involved.

Interesting points: 1. My SO_SENDBUF is currently approx 25k. Previous responses have typically been 10k-20k (although I have NOT measured this for all of them); it is possible that this is the first time I'm going over the size of the default send buffer. 2. I've been making excessive use of the convenience method in nio.charset:

1

ByteBufferbb = ((Charset) ch).encode( String );

...unfortunately, the nio team didn't document what kind of BB that code generates (direct, indirect, etc?) and any performance considerations. This is the one of the few "unknown quantity"'s in my code, so I'm treating it as potentially suspicious.

Could you try experimenting with limiting your requests to slightly less than the SO_SENDBUF? (have a look at Socket.getSendBufferSize() ). You can also change the size of SO_SENDBUF, but that may have further unpredictable effects, so I would suggest that you first of all just limit the amount of data you're sending...

If that has no effect, then are you making similar usage of Charset's?

First response takes 3 seconds (!) to send (it's only 63k, on a LAN!). If the request is repeated, the response takes 60 seconds (!) sending almost exactly the same data. If you try a third time, linux crashes...Normally, typical responses of the order of a few kb are transferred at LAN speeds (multiple Mbits, i.e. you'd expect perhaps 10 milliseconds for the transfer, not multiple seconds).

Using almost precisely the same ByteBuffer's (generated by the same algorithm filled with 63k data, etc): 1. I altered the position() of each buffer from the tail forwards so that the gathering-write would only process the first 20k of data (well within the 25k of my send buffer) 2. First request STILL takes 3 seconds. The amount of data has reduced by a factor of 3, the only thing that is constant is the number of BB's and their contents 3. Successive requests now also take 3 seconds (variation of +- 20%) 4. Re-enabling the full 63k of data, I noted that the individual BB's in the gathering write are delivering their content VERY fast (too fast for a human to see), but there is an inter-BB delay of up to 5 seconds (demonstrated by opening a telnet client, and watching the data appear in chunks exactly correlating to the contents of successive BB's).

The implications of this might be one or more of:

1. GatheringByteChannel.write( ) is extremely inefficient for large numbers of BB's in the array - I'm using approximately 60 BB's. This should be documented if this is true and deliberate, but there is no mention of it in the JDK docs at the moment. However, this could be a bug due to the following point:

2. Sun's implementation of GatheringByteChannel in SocketChannel is very poor (for JDK 1.4.2). GatheringByteChannel states that:

"Unless otherwise specified, a write operation will return only after writing all of the r requested bytes. Some types of channels, depending upon their state, may write only some of the bytes or possibly none at all. A socket channel in non-blocking mode, for example, cannot write any more bytes than are free in the socket's output buffer."

In practice, Sun's implementation appears to only write one BB's worth at a time. This can be very very much less that what the socket's output buffer could hold (in my current example, approximately 2% of the size!).

This may indirectly be the cause of the performance problems (perhaps the overhead of doing 60 select() operations in quick succession is causing the massive delays?).

Just try running it (it's only one class file) and it will print instructions.

I'd be interested to hear if other people's experience differs from my own...particularly on non-linux platforms.

I find that with 1 or very few buffers (less than 5) performance is what I'd expect given I'm going from localhost to localhost. With 100 buffers, the performance drops to around 50kb/s, which is pathetic even for a 10Mbit LAN, let alone local machine.

I'm still hoping I've made a stupid mistake here, but I wrote this test case from scratch and I'm still seeing this bad behaviour.

In practice, Sun's implementation appears to only write one BB's worth at a time. This can be very very much less that what the socket's output buffer could hold (in my current example, approximately 2% of the size!).

With the test case, I always get 16 buffers processed per WRITE operation, whether I use 100 buffers or 200. 16 is suspiciously power-of-2, so perhaps this is some "preferred" number of buffers to send at once.

With a large number of buffers (e.g. 100) it still sends 16 buffers at once, even though that's more data than my send-buffer can handle (approximately 32k for a 25k buffer)

Those results are the same whether the client telnet is from the local machine or another one over the LAN. It is clear from the telnet clients that WinXP is NOT slowing down throughout the transfer (unlike linux/OS X and win2k). I can't remember whether win98 even has real OS-level asynch I/O or not (windows completion ports) ? I'm surprised that win2k is doing badly here.

Unfortunately, telnet clients are ultimately a poor choice for this test because they can't reliably say much about the actual *transfer* speed (note: the testcase only measures the "apparent write-to-network-card speed" or similar).

My original problem is two-fold: both the apparent-write-speed and the actual-transfer speed are extremely low. I want to check that WinXP (and 98?) aren't just experiencing the same problem but reporting it differently . So, here's a (very simple) test client with source code, that does NOT output what it gets back to screen (System.out.println being notoriously bottlenecking...).

NOTE: Server HAS changed slightly, so you need to download both class files (or source files, if you prefer)

Apparently completion ports (windows only decent implementation of Asynch I/O) is only available on NT, 2k, and XP; this may be 50% of your specific problem.

The message I'm printing out is merely Exception.toString(), so that's the error message that your JVM is deciding to create. ISTR that windows has very low hard-coded limits on various aspects of it's networking - I'm guessing that win98 is hard-coded to 16 buffers. There was a bug in JDK 1.4.0 IIRC whereby no windows implementation could do more than 64 sockets at once, or something, because one of these hard-limits? But I'm almost totally ignorant of the implementation details of windows I/O, so don't assume I'm right .

ISTR also that these values can be changed, something like in the registry or via system calls to reconfigure the limit (just like the good old days of having to specify at boot-time the maximum number of file-handles in your autoexec.bat / config.sys ).

The thousand buffer test generated, according to the task manager, 250MB of garbage. So take that with a grain of salt since about 3/4 of the way through the test my cache started thrashing like mad.

Thanks for that detail. That mirrors the worst of my experiences with linux, and it's useful to know it happened on win2k too, although I've not yet managed to cause such excessive RAM usage with the simple test case. 1000 buffers use 125Mb according to linux. There must be a bug in my test case somewhere, because either way that sounds an insane amount of RAM usage. There have been some bugs in Sun's JVM's for years where it fails to GC I/O object-refs that have gone out of scope, so I've changed the server to explicitly null most references just before they go out of scope. This may or may not help (makes no difference in linux).

I'm afraid I've also spotted a bug in the server - it was artificially reporting much lower speeds than it should (for the speed calculation, I was using the most-recent number of bytes written, rather than total number).

I've fixed this, and added a line to summarise the total number of bytes actually sent, so you can easily verify the speed yourself .

Re-testing on my machines, the speeds are somewhat better - but still 100's of times slower than they should be.

Does anyone have any explanation for the behaviour we're seeing here, other than the possibility of a bug in NIO?

There have been some bugs in Sun's JVM's for years where it fails to GC I/O object-refs that have gone out of scope, so I've changed the server to explicitly null most references just before they go out of scope. This may or may not help (makes no difference in linux).

I know nothing about NIO, but I have encountered input/output streams being the source of memory leaks b4 as well.It wasn't in J2SE though, it was J2ME MIDP on the Siemens mobiles. If you lost your reference to an input/output stream before you called close() on it, it never got gc()'ed :S

Unlikely to be the problem your encountering, but it was related.... kinda [ok, maybe not, but still something worth knowing if u ever branch into J2ME ]

Does anyone have any explanation for the behaviour we're seeing here, other than the possibility of a bug in NIO?

Why don't you ask the guy who "wrote the book"? The 3rd section of his site is "Gotta Question" and he was friendly enough to exchange a series of 4 emails with me as I explained and described my problem and two of those emails were before I had bought the book.

Why don't you ask the guy who "wrote the book"? The 3rd section of his site is "Gotta Question" and he was friendly enough to exchange a series of 4 emails with me as I explained and described my problem and two of those emails were before I had bought the book.

Thanks, leknor, that was a good idea. Ron sensibly suggested doing a direct comparison with a manual implementation of iterating over the buffers and reading from each of them.

I've tried this, and now it seems that you can achieve almost 100% speed from doing this, instead of using the (supposedly faster!) gathering write! I've updated the server (NioTester) again, and it now takes an extra argument: either 1 which works exactly as before, or 2 which does:

This is not perfect: I'm afraid it may distort the readiness notification, and the interleaving of writes to different clients (because instead of doing one write to all buffers, it does consecutive writes to each buffer).

If you care to try this version, and compare the two different modes, I think you'll find that mode 2 works quite nicely:

I've submitted a bug-report, and included a pointer to this forum/topic. From current experience (I just got a bug confirmed yesterday), it's taking many weeks to get bugs processed, so it could easily be a month before it gets turned into an "official" bug.

In the meantime, I'm going to use the workaround in my production server, and see if it holds up when loaded.

So far, so good. I haven't subjected it to heavy load, but it's been handling website-tasks (HTTP) on the net for a week nonstop with no crashes. I've recently added some simple load and performance-measures which give figures representing "typical" and "recent" performance (e.g. using rolling-means/ranges and mean-with-feedback/hysteresis).

We're going to external alpha test v.soon, so I'll hopefully have some stats to verify things going OK. However, I wish SNMP was integrated into java in a much easier fashion (PS thanks to leknor for pointer to JMX). It appears right now that the best you can get is to use a J2EE server which supports JMX (it seems that only j2ee vendors currnetly have JMX solutions?), and even JMX is a bitch to get started with. And there's no way I'm touching J2EE for this project - even if Sun bribed me *hint, hint* (nah, not even then ).

Quote

I am thinking of putting a bug report in for the problem i was seeing with multiple clients, and so revisited your test.

What java version were you running?

1.4.2 - I waited until I could test it on the gold release before submitting bug etc.

The current downloads pump out (on the client side only, oops) the java version etc, so people could just cut and paste them along with figures.

It appears right now that the best you can get is to use a J2EE server which supports JMX (it seems that only j2ee vendors currnetly have JMX solutions?), and even JMX is a bitch to get started with. And there's no way I'm touching J2EE for this project - even if Sun bribed me *hint, hint* (nah, not even then ).

If you only want to use jmx maybe you should give jboss a try? They have a modular server which supposedly gives you the ability to strip out all services you don't want (ejb, jndi, webserver, etc etc ) and only keep the jmx bit, this is done by editing a xml configuration file.

Not sure if this is what you want, but I thought I could as well let you know (if you didn't already

If you only want to use jmx maybe you should give jboss a try? They have a modular server which supposedly gives you the ability to strip out all services you don't want (ejb, jndi, webserver, etc etc ) and only keep the jmx bit, this is done by editing a xml configuration file.

Ah, yes, their move to a micro-kernel. I'm afraid I still haven't got out of the habit of thinking "JBoss == opensource J2EE". You're correct that I ought to try a stripped-down version and see how good it is, although I still have nightmares about having tried to do that with things like linux before (where it is architecturally and theoretically easy, but in practice often just pain, pain, and more pain, depending on what you are trying to do).

The biggest question is going to be how easy it is to integrate with an existing server - whilst maintaining security policies and architectural constraints.

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org