Distributed system consensus is hard when: the network has an outage or dropped/delayed/duplicated packets; disks fail (and corrupt data); machines fail (and return with old data or no data)

Paxos and Raft are consensus protocols, Raft is understandable

The limit for sorting speed on modern systems is not the comparison computations, it's the data movement; so the old established sorting algorithms are no longer the best. Sorting is fastest by getting the data to the parallel cores faster. mergesort is a good parallel sort with a min-heap data structure. You size the heaps to fit into the CPU cache.

Log structured merge trees allow fast bulk inserts into large datasets (by making most disk writes sequential), and provide a hot set that fits into RAM to give fast reads. A Bloom/Cuckoo filter helps find items faster (or tell you the item doesn't exist faster) than just searching directly

Hybrid Logical Clocks ensures timestamps for events linked by causality are correctly ordered. This is done by sending a timestamp with every message, and using the later timestamp of the local clock or the largest timestamp of any message (where the local clock is already NTP synchronized).

User, sys and real times in the GC log are the same as the output of the unix 'time' command: real = the elapsed real time (wall clock time) for the GC; user = the CPU time spent in non-kernel user-mode for the GC; sys = the system CPU time of rth GC.

You would expect to see real time less than user+sys time for any GC executed with parallel threads (most GCs), since each thread has time on the CPU and these sum to give the final user and sys times. In fact you can get a good measure of the parallel efficiency exhibited by the GC by calculating the ratio '(user+sys)/real'. If the serial collector is used, then this ratio would be close to 1.

If the ratio '(user+sys)/real' is significantly greater than 1 for a GC log, this is a good indication of a problem with the system, either IO causing the GC to be blocked for a while, or CPU being saturated on the system by activity from other processes.

For latency-sensitive Java applications, you should move Java log files to a separate or high-performing disk drive (e.g., SSD, tmpfs)

Streams have a performance cost compared to hand-coded loops (eg increased memory allocations), but reduce errors from more clarity. Parallel streams should only be used in rare scenarios and only after you've measured both the parallel and serial operations to confirm the parallel one is in fact faster.

On smaller data sets the cost of splitting up work, scheduling it on other threads and stitching it back together once the stream has been processed will dwarf any speedup from running computations in parallel.

Don't underestimate the cost of parsing into objects and formatting objects into strings. If you can avoid the conversions, you can gain 2 orders of magnitude!

Use String formatting rather than concatenation when it's likely that the format won't be called (eg debug code) as that avoids unnecessary object creation most of the time; but concatenation instead of formatting when it is likely it will be called because format is much less efficient than concatenation.

With push you need to consider the scenario the receiver has to process more requests than it can handle. Either the producer waits for the consumer to be ready to receive, or waits for the result from the consumer. In both cases you have an implicit queue (maybe a set of threads). It's better to make the queue explicit, ideally between the producer and the consumer so that each can operate at their own maximum efficiency.

Queues can decouple services so that they are not reliant on knowing each other's APIs. This allows for many optimizations such as new implementations taking over seamlessly from older ones. Service discovery, load balancing, dynamic system scaling and reliable messaging can be handled by the queue infrastructure instead of the services, making the system more robust and scalable.

A message queue can act as a load balancer with inverted responsibility: a typical load balancer maintains a list of services it sends messages to and continuously checks they are available. Adding and removing services from the cluster needs to use the load balancer as controller. A message queue allows the service instances to attach and get/send messages on demand, the services themselves are effectively doing the load balancing simply by being available.