If the Xmx option is not defined explicitly, the JVM tries to use 1/4th of all memory available for the host OS which can lead to the container killing the Java process if the JVM memory usage grows over the cgroups limit defined for a Docker container. To avoid this, -XX:+UseCGroupMemoryLimitForHeap automatically sets Xmx to the memory limit defined in cgroup.

For JDK8+, class names and fields, method bytecode, the constant pool, etc are now located in Metaspace which is stored outside of JVM heap. GC and JIT also use non-heap native memory, as does DirectByteBuffer and JNI. Limit the Metaspace size with -XX:MaxMetaspaceSize. You can track native memory allocation with -XX:NativeMemoryTracking=summary but this will incur a 5-10% performance hit.

MinHeapFreeRatio and MaxHeapFreeRatio are manageable (from JDK7u60+), so you can change their values in runtime without the need to restart the Java process. This is useful for minimzing JVM memory in flight.

A docker image with the full JDK is around 0.5GB. With JDK9+ you can create custom JREs with just the subset of the JDK you need (jdeps tool helps identify which modules an application uses). A minimal JRE containing only the java.base module, eg ' jlink --module-path /docker-java-home/jmods --strip-debug --compress=2 --output java --add-modules java.base' with alpine linux and the musl-libc library in the image results in an image of less than 50MB in size (an order of magnitude smaller). (This is maintained in project Portola)

Class data sharing - UseAppCDS (see http://www.javaperformancetuning.com/news/news207.shtml for details) - works nicely for multiple instances of the same container, if you put the archive in the shared image.

The (experimental) AOT compiler jaotc can help reduce footprint and speed startup, but probably too early still in Java 10 to use.

Support for honouring cgroups limits set by the container has been added to Java 10 and cpu limitations are now supported by Runtime.availableProcessors(), ForkJoin pool, VM internal thread pools, core.async, ElasticSearch, Netty.

Additional flags that let you choose the % of RAM available to the container -XX:InitialRAMPercentage -XX:MaxRAMPercentage -XX:MinRAMPercentage

The container's minimum CPU limit = container's shares / total allocated shares. This is worst case. If container X is above minimum and another tenant starts to take more CPU, container X can suddenly have less available CPU. This makes analysis tricky, since you tend to notice the decrease in performance of container X but the machine appears stably utilized.

Overlay networks can come with a significant performance cost

For performance analysis, you have one kernel, but 2 perspectives (container vs host) and namespaces and cgroups.

Apply the USE method for performance analysis. For every resource check: Utilization (eg CPU time busy, cgroups % of cap); Saturation (eg CPU run queue or request latency, cgroups number of times throttled); Errors (eg CPU ECC errors). You can use this for both hardware and software resources. USE methodology limits the huge number of metrics that you can get, letting you ask what you should measure.