NameNode crash while enabling G1GC on large heap

We've enabled HA in our NameNodes. After increasing the heap size of the standby NameNode and manually failover to make it active, we encountered a crash. Previously we use CMS on a 120GB JVM heap. When we increased the heap size to 180GB and enabled G1GC, NameNode crashed due to timeout in writing to any JournalNodes.

So I think the timeout is not due to GC pause. However, resources (CPU, disk/network IO) usage are low at that time. We're using ShellBasedUnixGroupsMapping so the bottleneck is not in connections with LDAP server.

In the QuaromJournalNodes, no warning logs are found in that period. It seems the NameNode fails to connect to any of the JournalNodes.

This website uses cookies for analytics, personalisation and advertising. To learn more or change your cookie settings, please read our Cookie Policy. By continuing to browse, you agree to our use of cookies.