Description

FULL PRODUCT VERSION :

A DESCRIPTION OF THE PROBLEM :
As noted at JDK-8049846, the implementation of Java_java_net_SocketInputStream_socketRead0 assumes that read() won't block after poll() reports that a read is possible. This assumption does not hold, as noted on the man page for select (referenced by the man page for poll): Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block.

In production, we hit this about once a day. For testing and reproduction purposes, we can use fault injection to get spurious poll() results on demand.

[This report is probably more appropriate as a comment on JDK-8049846, but commenting requires an account, and obtaining an account does not appear to be an easy task]

2) On a Linux 32-bit system, checked this for JDK 7u80, 8u31, 8u40, 8u60 b06, 9 ea b54 and could reproduce this issue.
****************************************************************************************
7u80: FAIL
8: FAIL
8u31 : FAIL
8u40: FAIL
8u60 ea b05: FAIL
9 ea b54: FAIL
******************************************************************************************
Output of the testcase in 8u40:
$ LD_PRELOAD=./poll.so /net/jre.us.oracle.com/onestop/j2sdk/1.8.0_40/latest/binaries/linux-i586/bin/java -cp . OneReaderThread
Should never happen: A is unresponsive
A tick (0)
A tick (2)
Should never happen: A is unresponsive
A tick (2)
A tick (0)
A tick (0)
A tick (0)
Should never happen: A is unresponsive
A tick (2)
A tick (0)
A tick (0)
...................

Output of the testcase in 7u80:
$ LD_PRELOAD=./poll.so /net/scanas412.us.oracle.com/export/java_re/jdk/7u80/latest/binaries/linux-i586/bin/java -cp . OneReaderThread
Should never happen: A is unresponsive
A tick (0)
A tick (2)
Should never happen: A is unresponsive
A tick (2)
A tick (0)
A tick (0)
A tick (0)
Should never happen: A is unresponsive
A tick (2)
A tick (0)
A tick (0)
A tick (0)
Should never happen: A is unresponsive
..............................

Pardeep Sharma
added a comment - 2015-03-18 22:56 1) Attaching the testcase (poll.c, OneReaderThread.java).
2) On a Linux 32-bit system, checked this for JDK 7u80, 8u31, 8u40, 8u60 b06, 9 ea b54 and could reproduce this issue.
****************************************************************************************
7u80: FAIL
8: FAIL
8u31 : FAIL
8u40: FAIL
8u60 ea b05: FAIL
9 ea b54: FAIL
******************************************************************************************
Output of the testcase in 8u40:
$ LD_PRELOAD=./poll.so /net/jre.us.oracle.com/onestop/j2sdk/1.8.0_40/latest/binaries/linux-i586/bin/java -cp . OneReaderThread
Should never happen: A is unresponsive
A tick (0)
A tick (2)
Should never happen: A is unresponsive
A tick (2)
A tick (0)
A tick (0)
A tick (0)
Should never happen: A is unresponsive
A tick (2)
A tick (0)
A tick (0)
...................
Output of the testcase in 7u80:
$ LD_PRELOAD=./poll.so /net/scanas412.us.oracle.com/export/java_re/jdk/7u80/latest/binaries/linux-i586/bin/java -cp . OneReaderThread
Should never happen: A is unresponsive
A tick (0)
A tick (2)
Should never happen: A is unresponsive
A tick (2)
A tick (0)
A tick (0)
A tick (0)
Should never happen: A is unresponsive
A tick (2)
A tick (0)
A tick (0)
A tick (0)
Should never happen: A is unresponsive
..............................

Chris Hegarty
added a comment - 2016-08-09 03:08 The connection between 'poll' and 'select' is tenuous, but I accept that there may be an issue, albeit rare. Some testing should be done to see if the issue can be reproduced in-house.
It seems reasonable to prototype a version of socketRead0 that flips the non-blocking bit before and after, to see what other implications arise from doing that.

Robert Mckenna
added a comment - 2017-07-11 05:23 Critical request as a) this was already in 17_04 and appears to have been pushed out for some reason and b) we have already committed to providing this fix in Oct in a number of escalations.