I am getting the following error when i run Trunk with hadop-2.0.3.java.io.IOException: Failed read of int length 2at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)atorg.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)atorg.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)atorg.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)atorg.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)atorg.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)atorg.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)atorg.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)atorg.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)atorg.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)atorg.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)atorg.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)atorg.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)

Am able to reproduce this with the cluster but not with the testcases evenwhen i run with 2.0.3.

> Hi All>> I am getting the following error when i run Trunk with hadop-2.0.3.> java.io.IOException: Failed read of int length 2> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)> at>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)> at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)> at>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)> at>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)> at>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)> at>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)> at>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)> at>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)> at>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)> at>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)> at>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)> at>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)> at>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)>> Am able to reproduce this with the cluster but not with the testcases even> when i run with 2.0.3.>> Regards> Ram>

1) Is there a cause- stack?2) Can you ascertain if WAL is truncated at that place? Exception typemight have changed/exception might have expanded between Hadoop 1 and 2;WAL replay should ignore EOF, so if this is a EOF problem then this wouldbe easy to correct, if it's something more serious then it's bad.I will add some logging/catching around to add cause (if missing) anduseful logs.

> Hi All>> I am getting the following error when i run Trunk with hadop-2.0.3.> java.io.IOException: Failed read of int length 2> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)> at>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)> at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)> at>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)> at>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)> at>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)> at>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)> at>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)> at>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)> at>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)> at>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)> at>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)> at>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)> at>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)>> Am able to reproduce this with the cluster but not with the testcases even> when i run with 2.0.3.>> Regards> Ram>

> 1) Is there a cause- stack?> 2) Can you ascertain if WAL is truncated at that place? Exception type> might have changed/exception might have expanded between Hadoop 1 and 2;> WAL replay should ignore EOF, so if this is a EOF problem then this would> be easy to correct, if it's something more serious then it's bad.> I will add some logging/catching around to add cause (if missing) and> useful logs.>>> On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <> [EMAIL PROTECTED]> wrote:>>> Hi All>>>> I am getting the following error when i run Trunk with hadop-2.0.3.>> java.io.IOException: Failed read of int length 2>> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)>> at>>>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)>> at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)>> at>>>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)>> at>>>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)>> at>>>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)>> at>>>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)>> at>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)>> at>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)>> at>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)>> at>>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)>> at>>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)>> at>>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)>> at>>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)>>>> Am able to reproduce this with the cluster but not with the testcases even>> when i run with 2.0.3.>>>> Regards>> Ram>>>>

On further debugging found that this issue happens with ProtoBufWriter andnot with sequenceFileWriter.(atleast we could not reproduce it withdifferent runs)

We can see that the HLog has more data in it but while reading one of thelines in the HLog this error happens. So pretty much sure that it is notEOF.Verified DFS logs but could not find any exceptions out there too.

if (length != intBytes.length) throw new IOException("Failed read of intlength " + length);The length is from read call. This looks pretty suspicious, if the streamis not EOF why would it return less bytes? I will try to repro today.

The issue was not coming for the same test with same ammount of data withSequenceFileLogWritter & SFLR

Can the FSInputStream#read(byte[]) can read lesser bytes only even when itis not EOF? I can see we use IOUtils.*readFully*(in, byte[], int, int);What is the difference of this from the other? Do there be a diff when weread at file blocks boundary? (For these 2)

589824 is divisible by 4096, default buffer size. Que groundlesssuspicions. However this doesn't explain why sequencefile has no suchproblems. Do you have any more numbers with this?I am in meetings now, let me try readFully after I come back from lunch

What I can see is we use FSDataInputStream in case of SequenecFileLogReader(Not FSInputStream) and we use FSDataInputStream #readInt() (WALEdit#read...).This inturn reads the int from stream by calling in.read() 4 times makingsure we read full int (If it is not EOF issue)

May be we can try out this way and test tomorow Ram.

BTW IOUtils.readFully() is not returning the actual # bytes read it looks.