> > That reordering is exactly what I'm talking about. It wasn't my idea.> But if I understood it correctly, it's possible that the kernel> commits writes of an application, _to one and the same file_, in a> non-FIFO order, if the application does not fsync. And this _afaiu_> could result in the loss not only of new data, but complete corruption> of previously existing data in laptop mode without fsync.

No, you're not understanding the problem. All layers of the storagestack -- including the hard drive -- is allowed to reorder writes. Soeven if the kernel sends data to the disk in the exact same order thatthe application wrote it, it could still get written in a different order, because the hard drive itself can reorder writes. This is necessaryfor performance; if you didn't have this, the storage stack would be dog slow, and would consume even more power.

So at least level, the only thing you can count upon is that if you wantto make sure everything is flushed to stable store, you need to sendan fsync() command at the application to file system level, or a barrieror flush command at the OS to hard drive level.

So what databases do is the first write the changes they intend tomake to an intent log. Then they send an fsync() command; then they write a commit block to the intent log; then they send anotherfsync() command; and only then now that the transaction has beencommitted to the commit log, do they start updating the table files.(This is a highly simplified model, but it's good enough for this discussion.)

Ordering doesn't matter, because nothing, including the hard drive,guarantees ordering. What does matter is that the fsync() commandsact like barriers; writes before the fsync() command are guaranteedto be written to the disk, and survive a reboot, before any writes afterthe fsync() are processed. See?

This is why getting fsync() right is so critical; things are defined to workthis way, and programs like mysql and sqllite depend on things workingthis way. You are proposing to break this.

> (Though we're not talking about writing hundreds of> MBs in laptop mode in my average use case scenario of office> applications and maybe a browser running.)

Firefox, in order to make their "awesome bar" work, is responsible for300+ MB's worth of writes per click; so for every three clicks, you'vewritten a gigabyte. Any other questions?

> > No, what I meant is that if there is a bug at any step of the> coordination between the applications and the daemon: in the daemon,> the software, their communication connection, etc., writes may not> occur and we may lose data without need.

But the application will know that, and at the end of the day, ifthe coordination is wrong, the application can always ignore the daemon, write the data and call fsync(). So if there is any failure, it fails safe; worst case you just waste more battery.

> Your scenario sounds like this:> daemon announced when to flush data> until then application buffers data in it's user space.> > This means if you save a file and the application crashes, e.g. segfaults> and is killed, the data is still in its queue and thus lost.

If the application crashes, it will always lose data. If the application thinksits flaky, it can always ignore protocol and force a disk write; as I said,that will just burn battery, which is preferable to losing data.

> > Exactly. Great example! Again, I very much agree.("Even") I don't want> to end up with> corrupt data. But I accept old data. Is there really no way to get there without> rewriting each and every application's fsync code?

If the application is using a binary database file format, then no, if you subvertfsync(), you can risk losing the entire database. But even if you use 100'sof flat files, if you care about the relationship between the flat files as havingcritical meaning, then you can end up corrupting data even if you use lots of flat files.

If you are willing to rewrite the *entire* database to a completely new file eachtime you want to write out some data, and only delete the old databaseonce the new database has been written out, then you're fine. If the file is toobig you can delay the time period between a complete writeout of the database. But then if you drop your laptop and the battery slips out, you'lllose more data. Life is full of tradeoffs.

If the only editor you use is vi, and the only web browser you use is lynx, thenlife is much simpler. If you want more complexity, AND you want more safety,then you'll have to pay for that in terms of more battery usage.