Uwe Schindler
added a comment - 13/Jul/09 15:20 Attached is a patch using the JRE_IS_64BIT in Constants. I set the default to 256 MiBytes (128 seems to small for large indexes, if the index is e.g. about 1.5 GiBytes, you would get 6 junks.
I have no test data which size is good, it is just trying out (and depends e.g. on how often you reboot Windows, as Eks said).

Uwe Schindler
added a comment - 13/Jul/09 21:56 Eks Dev wrote in java-dev:
I have no test data which size is good, it is just trying out
Sure, for this you need bad OS and large index, you are not as lucky as I am to have it
Anyhow, I would argument against default value. An algorithm is quite simple, if you hit OOM on map(), reduce this value until it fits
no need to touch it if it works...

In my opinion, there is no problem with limiting the chunk size on 32 bit systems. The overhead of choosing the right chunk is neglectible, as it only affects seeking. Normal sequential reads must only check, if the current chunk has enough data and if not, move to the next. The non-chunked stream does this check, too (to throw EOF). With a chunk size of 256 MB, the theoretical maximum number of chunks is 8 (which can be never reached...).

Any other comments?

Eks: What was you value, that fixed your problem without rebooting. And: How big was your biggest index file?

Uwe Schindler
added a comment - 13/Jul/09 22:03 OK, we have two patches, we can think about using one of them.
In my opinion, there is no problem with limiting the chunk size on 32 bit systems. The overhead of choosing the right chunk is neglectible, as it only affects seeking. Normal sequential reads must only check, if the current chunk has enough data and if not, move to the next. The non-chunked stream does this check, too (to throw EOF). With a chunk size of 256 MB, the theoretical maximum number of chunks is 8 (which can be never reached...).
Any other comments?
Eks: What was you value, that fixed your problem without rebooting. And: How big was your biggest index file?

An algorithm is nice if there are no specific settings specified, but in an environment where large indexes may be opened more frequently than the common use cases, then what is happening is that the Memory layer is getting OOM conditions too much, forcing too much GC activity to attempt the operation.

I'd vote for checking if settings have been requested and using them, and if not set rely on a self-tuning algorithm.

In a really long running application, the process address space may become more and more fragmented, and the malloc library may not be able to defragment it, so the auto-tuning is nice, but it may not be great for all peoples needs.

For example, our specific use case (crazy as this may be) is to have many different indexes open at any one time, closing and opening them frequently (the Realtime Search stuff we are following very closely indeed.. ). I'm just thinking that our VM (64bit) may find it difficult to find the contiguous non-heap space for the MMap operation after many days/weeks in operation.

Maybe I'm just paranoid. But for operational purposes, it'd be nice to know we could change the setting based on our observations.

Paul Smith
added a comment - 13/Jul/09 22:07 An algorithm is nice if there are no specific settings specified, but in an environment where large indexes may be opened more frequently than the common use cases, then what is happening is that the Memory layer is getting OOM conditions too much, forcing too much GC activity to attempt the operation.
I'd vote for checking if settings have been requested and using them, and if not set rely on a self-tuning algorithm.
In a really long running application, the process address space may become more and more fragmented, and the malloc library may not be able to defragment it, so the auto-tuning is nice, but it may not be great for all peoples needs.
For example, our specific use case (crazy as this may be) is to have many different indexes open at any one time, closing and opening them frequently (the Realtime Search stuff we are following very closely indeed.. ). I'm just thinking that our VM (64bit) may find it difficult to find the contiguous non-heap space for the MMap operation after many days/weeks in operation.
Maybe I'm just paranoid. But for operational purposes, it'd be nice to know we could change the setting based on our observations.
thanks!

Uwe, you convinced me, I looked at the code, and indeed, no performance penalty for this.

what helped me was 1.1G... (I've tried to find maximum); Max file size is 1.4G ... but 1.1 is just OS coincidence, no magic about it.

I guess 512mb makes a good value, if memory is so fragmented that you cannot allocate 0.5G, you are definitely having some other problems around. We are taliking here about VM memory, and even on windows having 512Mb in block is not an issue (or better said, I have never seen problems with this value).

@Paul: It is misunderstanding, my "algorithm" was meant to be manual... no catching OOM and retry (I've burned my fingers already on catching RuntimeException, do only when absolutely desperate . Uwe made this value user settable anyhow.

Eks Dev
added a comment - 13/Jul/09 22:32 Uwe, you convinced me, I looked at the code, and indeed, no performance penalty for this.
what helped me was 1.1G... (I've tried to find maximum); Max file size is 1.4G ... but 1.1 is just OS coincidence, no magic about it.
I guess 512mb makes a good value, if memory is so fragmented that you cannot allocate 0.5G, you are definitely having some other problems around. We are taliking here about VM memory, and even on windows having 512Mb in block is not an issue (or better said, I have never seen problems with this value).
@Paul: It is misunderstanding, my "algorithm" was meant to be manual... no catching OOM and retry (I've burned my fingers already on catching RuntimeException, do only when absolutely desperate . Uwe made this value user settable anyhow.
Thanks Uwe!

Michael McCandless
added a comment - 13/Jul/09 23:10 I'd be more comfortable w/ 256 MB (or, smaller); I think fragmentation could easily cause 512MB to give the false OOM. I don't think we'll see real perf costs from buffer switching unless chunk size is very small (eg < 1 MB).
In any event, Uwe can you add to the javadocs describing this false OOM problem and what to do if you hit it?

Javadocs state (in FileChannel#map): "For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory."

So it should be as big as possible. A second problem with too many buffers is, that the MMU/TLB cannot handle too many of them effective.

In my opinion, maybe we could enhance MMapDirectory to work together with FileSwitchDirectory or something like that, to only use mmap for large files and all others handled by NIO/Simple. E.g. mapping the segments.gen file into memory is really wasting resources. So MMapDir would only return the MMapIndexInput, if the underlying file is > X Bytes (e.g. 8 Megabytes per default) and fall back to SimpleFSIndexInput otherwise.

In any event, Uwe can you add to the javadocs describing this false OOM problem and what to do if you hit it?

Uwe Schindler
added a comment - 13/Jul/09 23:28 Javadocs state (in FileChannel#map): "For most operating systems, mapping a file into memory is more expensive than reading or writing a few tens of kilobytes of data via the usual read and write methods. From the standpoint of performance it is generally only worth mapping relatively large files into memory."
So it should be as big as possible. A second problem with too many buffers is, that the MMU/TLB cannot handle too many of them effective.
In my opinion, maybe we could enhance MMapDirectory to work together with FileSwitchDirectory or something like that, to only use mmap for large files and all others handled by NIO/Simple. E.g. mapping the segments.gen file into memory is really wasting resources. So MMapDir would only return the MMapIndexInput, if the underlying file is > X Bytes (e.g. 8 Megabytes per default) and fall back to SimpleFSIndexInput otherwise.
In any event, Uwe can you add to the javadocs describing this false OOM problem and what to do if you hit it?
Will do this tomorrow, will go to bed now.
Here are also some other numbers about this problem: http://groups.google.com/group/jsr203-interest/browse_thread/thread/66f6a5042f2b0c4a/12228bbd57d1956d

About the automatic fallback to a SimpleFSIndexInput for small files like segment*, *.del, I open another issue targeted to 3.1. MMapping of small files is wasting system resources and may be slower than just reading a few bytes with SimpleFSIndexInput.

Uwe Schindler
added a comment - 14/Jul/09 09:34 Committed revision: 793826
Thanks Eks!
About the automatic fallback to a SimpleFSIndexInput for small files like segment*, *.del, I open another issue targeted to 3.1. MMapping of small files is wasting system resources and may be slower than just reading a few bytes with SimpleFSIndexInput.