-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
This is interesting. What is the ddl used to create the test table, I
would like to understand the size of the log records being written
Darby internally uses buffered I/O so from the darby code I would only
expect a single write call to the jvm per insert/commit (unless the
records are really long due to data types in the table). We use a JVM
buffered stream, I wonder if this is somehow changing our one write into
2 - the second writes all seem to be 4k page size writes on a 4k size
block boundary.
It may have something to do with the fact that log writes are not block
bounded, so when the commit comes in we are likely doing an write call
beginning at a non 4k offset. It may be better in the future to only do
log I/O on page boundaries as the system already does with data pages.
Originally the system was designed to use as few resources as possible
as so that it could run on small devices like cell phones, so the log
file was not preallocated, the size of log files was very small, and
they were deleted as quickly as possible. Now that the system is
preallocating log file space it may be worth doing to work to block the
log file I/O from the darby code into page size chunks.
I have not used this tool, does it trace actual OS system calls coming
out of the JVM?
The I/O's I was talking about are actual I/O's to the disk, I believe
these writes are to the OS cache. Before doing preallocation the sync
call made would force the OS to force both the most recently changed
page (for small log records this was usually one page), and also
whatever page the OS maintained metadata on (like allocation information
and last modified time). This is very OS and filesystem dependent
(logged filesystems are likely to perform very differently).
Which OS/filesystem are you using?
Steen Jansdal wrote:
| Mike Matrigali wrote:
|
|> I think that was tried when the original change was made and the
|> system did not see the expected IO performance that was expected at
|> log commit time. The problem this change is addresses is that
|> performance of the log sync call at commit time was sometimes 2 to 3
|> times slower than other databases for single user short transaction
|> benchmarks. The difference depends on OS/JVM/and disk types.
|>
|> Previous to jdk1.4 the only pure java way to sync data to disk was a
|> call that told the JVM to basically sync the whole file. Our
|> investigation determined that at least on windows each extending write
|> using this mechansism was causing multiple I/O's to the log file -
|> likely one for the data and one for system metadata about the file.
|>
|> In jdk1.4 a new interface was added which guaranteed sync of writes
|> for each write executed. Unfortunately on some jvm's in jdk1.4.1 it
|> did not
|> actually do the sync. This bug was fixed in jdk1.4.2 jvms.
|>
|> So in jdk1.4.2 jvm's the new interface was used. But testing showed the
|> system still got the multiple I/O performance unless the system wrote to
|> a preallocated log file rather than an ever growing one. So at the
|> same time the code being discussed here was added to preallocate log
|> files.
|>
|> I would worry that the suggested trick would actually not allocate all
|> pages to the file (on all OS/JVM's), and that subsequent sync writes
|> at commit time would again see the multiple I/O peformance that was
|> trying to be avoided.
|>
|> Making allocation run fast is good, but only if the subsequent syncing
|> writes as part of commits are optimized. Log file allocation can be
|> made asynchronous to client application work, but the syncing
|> peformance directly affects clients performance.
|>
|> It is unfortunate that the jvm documentation is so bad in this area,
|> as you say the behaviour in this situation is undocumented. From
|> documentation we can't even tell what "metadata" is being discussed
|> when using the "rws" and "rwd" modes.
|>
|
|
| Are you telling me that on my jdk1.4.2_03 commits will only generate one
| I/O write to the log file. By using a file monitoring tool I can see
| that each commit generate two log writes.
|
| I wrote a little test program:
|
|
|
| stmt.execute ( "INSERT INTO test (idx,str) VALUES (1,'hello')" );
|
| // wait 10 seconds
| for ( int sec = 0; sec < 10; sec++ ) {
| start = System.currentTimeMillis();
| while ( System.currentTimeMillis() < start + 1000 )
| ;
| System.out.print ( "%" );
| }
|
| stmt.execute ( "INSERT INTO test (idx,str) VALUES (1,'hello')" );
|
| // wait 10 seconds
| for ( int sec = 0; sec < 10; sec++ ) {
| start = System.currentTimeMillis();
| while ( System.currentTimeMillis() < start + 1000 )
| ;
| System.out.print ( "&" );
| }
|
| con.setAutoCommit(false);
| for ( int j = 0; j < 10; j++ )
| stmt.execute ( "INSERT INTO test (idx,str) VALUES (1,'hello')" );
| con.commit();
|
|
|
| And by monitoring with FileMon (freeware from www.sysinternals.com)
| I get the following output
|
| 1 09:17:02 javaw.exe:304 WRITE
| C:\src\DatabaseBenchmark\databases\derby\log\log2.dat SUCCESS
| Offset: 332944 Length: 132
| 2 09:17:02 javaw.exe:304 WRITE
| C:\src\DatabaseBenchmark\databases\derby\log\log2.dat SUCCESS
| Offset: 331776 Length: 4096
| 3 09:17:12 javaw.exe:304 WRITE
| C:\src\DatabaseBenchmark\databases\derby\log\log2.dat SUCCESS
| Offset: 333076 Length: 132
| 4 09:17:12 javaw.exe:304 WRITE
| C:\src\DatabaseBenchmark\databases\derby\log\log2.dat SUCCESS
| Offset: 331776 Length: 4096
| 5 09:17:22 javaw.exe:304 WRITE
| C:\src\DatabaseBenchmark\databases\derby\log\log2.dat SUCCESS
| Offset: 333208 Length: 699
| 6 09:17:22 javaw.exe:304 WRITE
| C:\src\DatabaseBenchmark\databases\derby\log\log2.dat SUCCESS
| Offset: 331776 Length: 4096
|
|
|
| This shows that each commit generates two disk writes. And whats more
| strange is that first 132 bytes are written and then its overwritten by
| 4096 bytes. (See the offsets and lengths of the writes)
|
| Steen
|
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBOJiWEpeslyHqPs0RAq4EAJ9veEdr4hnaDkhLrjNt7C5tXSgElgCfQila
m/If5qEmcEhqFy2z+LTy/v4=
=snfd
-----END PGP SIGNATURE-----