Unsafe use of DataInput.skipBytes() in StoredPage and StoredFieldHeader

Details

Description

Some methods in StoredFileHeader and StoredPage call java.io.DataInput.skipBytes(int) with the assumption that it always skips the requested number of bytes. According to the javadoc for skipBytes, it may skip fewer bytes than requested, possibly 0, even if the end of the stream hasn't been reached.

Activity

Hi, Knut. I'm interested in this issue, and wonder what's your favorite way to correct this problem.

1.) to throw a IOException.
2.) retry to skip for some define times, if failed throw a IOExcetion.

Besides, there are still many other invocation on skipBytes(), i.e.*ImageReader, BlockDataInputStream, an so on. Should they be correct altogether for this issue? If so, is there a wrapped function to resolve this problem uniformly.

Yun Lee
added a comment - 29/Mar/09 06:23 Hi, Knut. I'm interested in this issue, and wonder what's your favorite way to correct this problem.
1.) to throw a IOException.
2.) retry to skip for some define times, if failed throw a IOExcetion.
Besides, there are still many other invocation on skipBytes(), i.e.*ImageReader, BlockDataInputStream, an so on. Should they be correct altogether for this issue? If so, is there a wrapped function to resolve this problem uniformly.
Thanks!

My preferred solution would be to have a variant of org.apache.derby.iapi.services.io.InputStreamUtil.skipFully() that could take a DataInput argument. That method uses skip until 0 is returned, then it uses read() which is guaranteed to block until there is something to read. If read returns -1, an EOFException is thrown. Currently skipFully() is only implemented for InputStream, I think.

I'm not sure I understand your question about *ImageReader and BlockDataInputStream. Those classes are part of the JDK, aren't they? I didn't find any references to them in the Derby code.

Knut Anders Hatlen
added a comment - 31/Mar/09 10:53 Hi Yun,
My preferred solution would be to have a variant of org.apache.derby.iapi.services.io.InputStreamUtil.skipFully() that could take a DataInput argument. That method uses skip until 0 is returned, then it uses read() which is guaranteed to block until there is something to read. If read returns -1, an EOFException is thrown. Currently skipFully() is only implemented for InputStream, I think.
I'm not sure I understand your question about *ImageReader and BlockDataInputStream. Those classes are part of the JDK, aren't they? I didn't find any references to them in the Derby code.

>My preferred solution would be to have a variant of org.apache.derby.iapi.services.io.InputStreamUtil.skipFully() that could take a DataInput argument. That method uses skip until 0 is returned, then it uses read() which is guaranteed to block until there is something to read. If read returns -1, an EOFException is thrown. Currently skipFully() is only implemented for InputStream, I think.

With the util, the problem can be resolved easily. I just doubt, skipPersistent() used in InputStreamUtil.skipFully() will cause a new problem on time efficiency, as it's possible to wait a long time for the block finishing.

>I'm not sure I understand your question about *ImageReader and BlockDataInputStream. Those classes are part of the JDK, aren't they? I didn't find any references to them in the Derby code.

I'm sorry to have seen the two classes in a careless look at 'call hierachy' window in Eclipse, .

Yun Lee
added a comment - 31/Mar/09 15:20 - edited Knut, thanks for your advice!
>My preferred solution would be to have a variant of org.apache.derby.iapi.services.io.InputStreamUtil.skipFully() that could take a DataInput argument. That method uses skip until 0 is returned, then it uses read() which is guaranteed to block until there is something to read. If read returns -1, an EOFException is thrown. Currently skipFully() is only implemented for InputStream, I think.
With the util, the problem can be resolved easily. I just doubt, skipPersistent() used in InputStreamUtil.skipFully() will cause a new problem on time efficiency, as it's possible to wait a long time for the block finishing.
>I'm not sure I understand your question about *ImageReader and BlockDataInputStream. Those classes are part of the JDK, aren't they? I didn't find any references to them in the Derby code.
I'm sorry to have seen the two classes in a careless look at 'call hierachy' window in Eclipse, .
I will post a patch on this weekend.
Yun

Knut, I have added a function skipFullyDataInput(DataInput di, int skippedBytes) in InputStreamUtil following by one test case, and applied it to replace the calling to DataInput.skipBytes() in StoredFieldHeader, StoredPage and ClassInvestigator. Also, I have removed unused import clauses and organized the import part too.

Yun Lee
added a comment - 11/Apr/09 09:12 Knut, I have added a function skipFullyDataInput(DataInput di, int skippedBytes) in InputStreamUtil following by one test case, and applied it to replace the calling to DataInput.skipBytes() in StoredFieldHeader, StoredPage and ClassInvestigator. Also, I have removed unused import clauses and organized the import part too.
Please check it! Thanks!
Yun

Knut Anders Hatlen
added a comment - 11/Apr/09 18:30 Thanks for the patch, Yun!
I have some small comments:
1) skipPersistentDataInput() checks if DataInput.readByte() returns -1 to detect that EOF has been reached, but I think that that method will throw EOFException and not return -1 on EOF
2) I think I would have renamed skipPersistentDataInput() and skipFullyDataInput() to skipPersistent() and skipFully().

>1) skipPersistentDataInput() checks if DataInput.readByte() returns -1 to detect that EOF has been reached, but I think that that method will throw EOFException and not return -1 on EOF

I have changed both skipPersistentDataInput() and skipPersistent() to perform like what you need, and changed the document of UTF8Util.internalSkip(final InputStream in, final long charsToSkip) which used the skipPersistent(). I think the new revision can act smartlier. Please check the new patches, thanks!

>2) I think I would have renamed skipPersistentDataInput() and skipFullyDataInput() to skipPersistent() and skipFully().

I have tried this before providing the first patches, however, I found it's not able to use overload here, as it will lead to some compiling-time error on ambiguous use, i.e. InputStreamUtil.skipFully(null, int), and InputStreamUtil.skipFully(dis, int) where dis is an instance of DataInputStream.

Yun Lee
added a comment - 12/Apr/09 12:32 Knut,
>1) skipPersistentDataInput() checks if DataInput.readByte() returns -1 to detect that EOF has been reached, but I think that that method will throw EOFException and not return -1 on EOF
I have changed both skipPersistentDataInput() and skipPersistent() to perform like what you need, and changed the document of UTF8Util.internalSkip(final InputStream in, final long charsToSkip) which used the skipPersistent(). I think the new revision can act smartlier. Please check the new patches, thanks!
>2) I think I would have renamed skipPersistentDataInput() and skipFullyDataInput() to skipPersistent() and skipFully().
I have tried this before providing the first patches, however, I found it's not able to use overload here, as it will lead to some compiling-time error on ambiguous use, i.e. InputStreamUtil.skipFully(null, int), and InputStreamUtil.skipFully(dis, int) where dis is an instance of DataInputStream.
Yun

My point about DataInput.readByte() was that -1 is not an indication that the end of the stream has been reached, so it's not correct to throw an EOFException when readByte() returns -1. I think skipFully() will have to be implemented along the lines of this untested code:

{
// No bytes skipped, read one byte to see if EOF has been
// reached. DataInput.readByte() will throw an EOFException
// if there's nothing more to read.
in.readByte();
// Still more data to read. Account for the byte we just read.
skipped++;
}

bytesToSkip -= skipped;
}
}

The changes to the javadoc for skipPersistent(InputStream,int) also look wrong, as that method is not supposed to throw an EOFException if end of stream is reached before all the bytes have been skipped. If it reaches end of stream prematurely, it returns the number of bytes actually skipped, just as the old javadoc said.

As to the compile errors when using the same name for the different variants of skipFully(), it should be possible to disambiguate the method calls by using a cast, like this:
InputStreamUtil.skipFully((InputStream) null, x);
or
InputStreamUtil.skipFully((DataInput) null, x);

But perhaps the problem here is that we're trying to overload InputStreamUtil too much. Since a DataInput is not (necessarily) an InputStream it might be cleaner to create a new class, DataInputUtil, where we can collect utility methods for DataInputs without conflicting with the utilities for InputStreams.

Knut Anders Hatlen
added a comment - 12/Apr/09 17:37 Hi Yun,
My point about DataInput.readByte() was that -1 is not an indication that the end of the stream has been reached, so it's not correct to throw an EOFException when readByte() returns -1. I think skipFully() will have to be implemented along the lines of this untested code:
public static void skipFully(DataInput in, int bytesToSkip)
throws IOException
{
if (in == null)
{
throw new NullPointerException();
}
while (bytesToSkip > 0) {
int skipped = in.skipBytes(bytesToSkip);
if (skipped == 0)
{
// No bytes skipped, read one byte to see if EOF has been
// reached. DataInput.readByte() will throw an EOFException
// if there's nothing more to read.
in.readByte();
// Still more data to read. Account for the byte we just read.
skipped++;
}
bytesToSkip -= skipped;
}
}
The changes to the javadoc for skipPersistent(InputStream,int) also look wrong, as that method is not supposed to throw an EOFException if end of stream is reached before all the bytes have been skipped. If it reaches end of stream prematurely, it returns the number of bytes actually skipped, just as the old javadoc said.
As to the compile errors when using the same name for the different variants of skipFully(), it should be possible to disambiguate the method calls by using a cast, like this:
InputStreamUtil.skipFully((InputStream) null, x);
or
InputStreamUtil.skipFully((DataInput) null, x);
But perhaps the problem here is that we're trying to overload InputStreamUtil too much. Since a DataInput is not (necessarily) an InputStream it might be cleaner to create a new class, DataInputUtil, where we can collect utility methods for DataInputs without conflicting with the utilities for InputStreams.

Yun Lee
added a comment - 13/Apr/09 14:23 Knut, I think you are right! DataInput.readByte() does act differently from InputStream.read(), and throws a EOF Exception when EOF is met.
I have moved the code into a new util class DataInputUtil, and wrote the corresponding test class. Please check it, thanks!
Yun

Knut Anders Hatlen
added a comment - 14/Apr/09 15:38 Thanks for the new patch. I think it looks good. Two tiny comments:
1) The license header is missing in the new test class.
2) Much of the diff in StoredPage is just reorganizing import statements. It's probably better to do that in a separate patch, if we think those statements need to be tidied up.

Knut. I agree with you, derby-3941-4.diff and derby-3941-4.stat gives change on skipFully(), while derby-3941-5.diff and derby-3941-5.stat gives a pure change on import organize on StoredPage. I think importer reorganization is necessary to keep code clean. Wish for your check.