Description

I often get issues like this:
java.net.SocketTimeoutException: sent ping but didn't receive pong within 1000ms (after 330 successful ping/pongs)

One single issue breaks the entire task and makes it hard to even cancel the task. Should this not be retried rather than break execution? Our Jenkins will run longer running tasks as well. Any single task breaking stop in the middle is a real issue, and I don't see why one network issue after 330 successful ones (in this case) is such a big issue.

Tyrone Grech
added a comment - 2019-08-01 06:33 We are also encountering this issue fairly often in our CI system running:
On premises Kubernetes cluster on version 1.14.1
Jenkins version 2.186
Kubernetes Plugin version 1.17.2

That option helped for us. But the reason why the pings started to fail was actually the JVM garbage collector, which caused the master to hang for more than 1 second. We switched from the default to G1GC to reduce time the master is blocked, and this helped with other timeouts too.

Juha Tiensyrjä
added a comment - 2019-09-18 10:04 That option helped for us. But the reason why the pings started to fail was actually the JVM garbage collector, which caused the master to hang for more than 1 second. We switched from the default to G1GC to reduce time the master is blocked, and this helped with other timeouts too.

Allan BURDAJEWICZ
added a comment - 2019-12-17 01:53 I believe that this issue is resolved since the release of version 1.19.3 that uses kubernetes-client 4.6.0 in which the default ping interval is 30 seconds:
https://github.com/fabric8io/kubernetes-client/commit/2b1799497f46de81c841ea43808472d3239e7209#diff-7a4b549d7e10b88fbe20ebe680f6b25b
https://github.com/jenkinsci/kubernetes-plugin/commit/464320a012fa0fd47b92f3af3d0403afd22c41a5#diff-600376dffeb79835ede4a0b285078036
https://github.com/jenkinsci/kubernetes-client-api-plugin/blob/kubernetes-client-api-4.6.0-1/pom.xml#L20
maybe Vincent Latombe can confirm ?