i don't see why you keep holding on to the idea that when you multithread file writing, the disk is writing each file at the same time and therefore seeking a lot... that's not true. you only seek once per file

If you're reading/writing concurrently, you have to seek at every context-switch (thread switch).

Hi, appreciate more people! Σ ♥ = ¾Learn how to award medals... and work your way up the social rankings!

Riven, try concurrently writing on the same file. It should be faster than writing serially on 1 file. It triggers NCQ. Ok, please test with NCQ enabled, i.e. AHCI. Also, Java's IO is written to work concurrently to some extent, so the said context switching won't occur if it's the same file.

Sorry, I feel the need to sit on the fence yet again - a PITA for you guys and for me!

Isn't what Riven and counterp are saying both right?

Riven is right about the inherently serial nature of the disk write, but as others have pointed out, in most modern OS's / file systems isn't the FileOutputStream actually writing to RAM to be later flushed to disk by the filesystem, and therefore the implication that each thread context shift results in a disk seek is not correct???

Oh, and the effect of the above will also be that benchmarks will be quite different across different file systems.

in most modern OS's / file systems isn't the FileOutputStream actually writing to RAM to be later flushed to disk by the filesystem, and therefore the implication that each thread context shift results in a disk seek is not correct???

No, the RAM will be sync-ed with the storage device, in a blocking operation.

in most modern OS's / file systems isn't the FileOutputStream actually writing to RAM to be later flushed to disk by the filesystem, and therefore the implication that each thread context shift results in a disk seek is not correct???

No, the RAM will be sync-ed with the storage device, in a blocking operation.

If we didn't have such a guarantee, lots of critical applications (like databases) couldn't restore to a 'known state' after a crash.

hmm ... from the JavaDoc for flush()

Quote

If the intended destination of this stream is an abstraction provided by the underlying operating system, for example a file, then flushing the stream guarantees only that bytes previously written to the stream are passed to the operating system for writing; it does not guarantee that they are actually written to a physical device such as a disk drive.

It seems you are right flush() does not block, while sync() and force(...) do.

But you wouldn't do these for every block of data you write. To use your example above, there are databases which don't even bother to do this after every write. There are other ways of achieving data integrity, and for that matter these still don't guarantee the data is actually written to the disk depending on the hardware.

Still, you can observe the dramatic performance degradation when doing concurrent file writes on HDDs.

I wasn't disputing you were seeing this, just your comment about head-seek for every context-shift. It'd be interesting to see how this differs across filesystems and OS's. For that matter, I assume you've tried the above benchmark with multiple threads as opposed to just multiple writes? I could understand why these two circumstances could be treated very differently in the underlying filesystem.

I don't think anyone disputes that concurrent access harms throughput. The reasons why this are more likely to do with context switching overhead and buffer cache busting than any universal truths about physical properties of the HDD, since you can see the effect even on a SSD (though it seems to take more threads to do it).

Even putting aside I/O, there's plenty of other reasons to avoid using threads unless you really need parallel execution on multiple cores (and most I/O doesn't need it). Even then you should at least be using java.util.concurrent.

Yes, I do! If you'd said "I don't think anyone disputes that concurrent access harms throughput on some systems" maybe. I've no doubt of the results people have seen so far, but the whole thing is very dependent on the underlying OS and filesystem.

Eg. Riven's benchmark on Linux with ext4

wrote 1 files of 512MB in 9 sec, total throughput: 56MB/secwrote 2 files of 512MB in 16 sec, total throughput: 64MB/secwrote 3 files of 512MB in 29 sec, total throughput: 51MB/secwrote 4 files of 512MB in 39 sec, total throughput: 52MB/secwrote 5 files of 512MB in 48 sec, total throughput: 50MB/secwrote 6 files of 512MB in 57 sec, total throughput: 48MB/secwrote 7 files of 512MB in 69 sec, total throughput: 49MB/secwrote 8 files of 512MB in 78 sec, total throughput: 48MB/sec

That seems fairly consistent to me. Though my laptop HD is so slow to start with, I'm not sure it's a fair benchmark!

Even putting aside I/O, there's plenty of other reasons to avoid using threads unless you really need parallel execution on multiple cores (and most I/O doesn't need it). Even then you should at least be using java.util.concurrent.

Not sure exactly what you're getting at here, but there are plenty of good reasons to use Threads. While I/O is not necessarily one of them, I'm not sure unnecessarily serializing (synchronizing) your threading model for I/O is always justified either.

... just off to delete all them tmp files before I forget they're there and wonder why my disk has shrunk.

Sounds like you're measuring burst latency then. The valuable lesson we learned in that case how badly HDD makers lie.

Can we assume FileOutputStream.close() sync's to the storage device?

If so, then my benchmark still stands

I wouldn't assume that. I don't think Java would force something like that, as it's an OS feature. Most of the time with hard drive access you aren't interested in the data actually being written to the hard drive immediately, only that it will be written eventually. If close() forced a cache flush, it would negate everything gained from actually having the cache in the first place.

I wouldn't assume that. I don't think Java would force something like that, as it's an OS feature. ... If close() forced a cache flush, it would negate everything gained from actually having the cache in the first place.

I agree. In fact, I'd go as far as to say it's safe to assume it doesn't. I've found something definitive on Android, but not on Java yet. However, I assume that's the point in having the FD.sync() method in the first place. Maybe add that into the benchmark? And threads too - interested to know if there's any threadlocal stuff affecting caching (I'm too busy lazy to write myself atm).

I think it's pretty easy to rule out the OS caching data: writing so much that the OS simply can't cache it anymore.

If you write a file that is roughly 4 times the available RAM, and it has the same performance as writing a file about as big as the available RAM, you pretty much know (almost) everything is written to disk when the stream is closed (within a small margin of error).

If you want to force fsync, you have to call the outputstream's getFD().sync() . Closing a stream only flushes it, which means the data is no longer the application's problem, but the OS and filesystem is free to buffer it as long as it feels like.

fsync() itself is only a very strong suggestion though, and HDD manufacturers often still cache things on the onboard RAM on the drive, which won't survive a power failure.

At some point you have to ensure the data is written to persistent storage, like prior to system shutdown / reboot. Same applies for 'safe removal' of external devices. Obviously this is done at the OS level, but I'd highly doubt we can't access that functionality somehow (without actually shutting down the device).

Hi, appreciate more people! Σ ♥ = ¾Learn how to award medals... and work your way up the social rankings!

Riven your benchmark is nice and all...... but writing files sequentially at that size and amount on the same thread will also quickly slow down the harddrive at the SAME RATE.... I don't get what point your trying to make..

It doesn't change the fact that single threaded is still slower than multi threaded lol...

You're also forgetting that 1) max file size is 2MB, and 2) minecraft will not be saving these files that often (you would have to be writing thousands at the same time or sequentially to get that kind of slowdown)

And I said object creation was only an example of one of the factors that can be done concurrently (parallel) on different threads. (creating a FileOutputStream object is actually really slow, almost an entire millisecond, although most of that is probably the actual opening of the file, there are still bulky operations like checking permissions)

What you're lacking is a benchmark relative to the argument we're having; that is, one comparing multithreaded vs singlethreaded writing (which would clearly prove I'm right, I've posted one, you should post your own too, just to make sure that my results aren't biased or anything)

(either ncq or queue depth are the slowing factor, java blocks until the OS can queue IO requests. I'm going to take a wild guess and say nsigma doesn't have ncq (enabled))

make sure you delete the large file (in your root project directory) afterwards, you should stop the program after you see the delay go up, if you wanna see throughput it's only a small modification to the above

java-gaming.org is not responsible for the content posted by its members, including references to external websites,
and other references that may or may not have a relation with our primarily
gaming and game production oriented community.
inquiries and complaints can be sent via email to the info‑account of the
company managing the website of java‑gaming.org