High cardinality alarms (lots of alarms from one event) - cause you to misunderstand the cause because you're busy looking through so many alarms; Reactive alarms (built to detect a specific issue that was fixed) - become useless and distracting over time; Tool fatigue - too many tools, not integrated, not particularly well implemented, manually fixed. All seem to be the right thing at the time, but are the wrong long term solution.

Key components of good alarms: Signal - no false positive; Actionability - can I do something about it right now; Relevancy - Is this the only alarm relevant to this event (if not, you need to delete those extra alarms). Eliminate all alarms that do not satisfy these.

Footprint tuning is about making in-memory data structures memory efficient. Using LRU caches, and soft references, allow you to hold subsets of recomputable data. Object overheads mean that if you are minimizing memory, you need to get creative with memory, minimizing object numbers by flattening hierarchical structures and using object data encoded into primitive arrays.

Many performance problems can be fixed by throwing more memory at it.

Compressed oops is automatically used below 30GB heaps; there is no point in having heaps between 32GB and 48GB as the compressed oops are lost and you actually lose space in that region.

Avoid using boxing (primitive data object wrappers), especially if memory is an issue.

ThreadLocals stick around, be aware that if you are using them in pools they need resetting or you have a type of memory leak. Often you are better off just creating objects as you need rather than holding a threadlocal instance.

Compactness(inverse of memory size) x Responsiveness (inverse of latency) x Throughput == some constant for a particular system. Tuning means you can increase one of these at the expense of one or two of the others. Optimization means you can change the system configuration or algorithmsto increase the constant.

The biggest threat to responsiveness (request latency) in the JVM is garbage collection pauses.

Dead objects are free to collect in the young generation. So tThe young generation should be big enough to hold more than one set of all concurrently generated request-response cycle objects (ie they'll all be dead at the end of the cycle, so efficiently collected). Each survivor space should be big enough to hold all active request objects + tenuring ones.

The tenuring threshold should be set to that tenuring objects tenure fast.

The adaptive throughput collector can adjust to targets specified by MaxGCPauseMillis and GCTimeRatio

Always start by tuning the young gen. Use PrintGCDetails, PrintHeapAtGC, PrintTenuringDistribution. Keep an eye on survivor sizes. Try to make sure the Survivor spaces are never 100%.

CMS is good if it can stay ahead of object allocation; otherwise you need to tune it. You also have to keep fragmentation low.

CMS typically needs a third larger heap than other collectors for extra space while it cleans up concurrently to the application running.

The CMS stop-the-world pause compacts at the same time and this is a very long pause (minutes!). You need to avoid this.

The InitiatingOccupancyFraction for CMS can be lowered to make it more responsive - even down to 0, if you have space CPU, it would run continuously then.

If the NewSize is large and you have many live objects, you could get a long pause. You have to reduce young gen and tenuring threshold to reduce the pause time.

If you have too many threads, this takes more time during GC because each thread is a GC root so needs to be scanned during the GC.

synchronized blocks scale well with IO; but should not be used in tight loops.

Using synchronized on methods means you are using the instance itself as the lock object. Any code that has access to that same instance can also synchronize on the instance, which is a potential denial of critical section to all other users of that instance. To avoid this, you can use an internal lock object in the instance class to explicitly synchronize the method body, instead of using synchronized on the method.

ReentrantLock is more sophisticated than synchronized because it is interruptible. It should be used in a try-finally block. Because it is a library lock, the JVM cannot optimize it in the same way as it can optimize synchronized.

ReentrantReadWriteLock let's you separate out read locking and write locking, typically for one writer and many readers.

Semaphore, CountDownLatch, Phaser, DelayQueue, are all based on AbstractQueuedSynchronizer, so similar under the covers to ReentrantLock.

The read-copy-update technique uses a separate immutable data object to hold compound data, with the mutable field holding the immutable data object being updated atomically. It generates some extra garbage.

volatile on its own is normally only adequate to solve the simplest concurrency problems, eg a boolean that needs to be read across threads. It doesn't even provide atomic increments.

The downsides of locks: they prevent parallelism; can cause deadlocks, livelocks, thread starvation; they give no guarantee that the thread can progress; they can be used unstructured (eg forget to unlock).

AtomicReference.compareAndSet is used to speculatively update a value based on the old value not having changed. Eg AtomicReference ref = ...; old = ref.get(); doStuff(); if (ref.compareAndSet(old, new)) //nothing changed ref while we did stuff

Sequence locking uses an atomically updated long as a sequence lock: you can write compound values to other shared fields only if the sequence is even (ie after a speculative increment succeeds, so the sequence value would be odd and no ther thread could now write). This works for not too many concurrent threads, but not for thousands of threads. Sequence locking is available as StampedLock in Java 8+.