tag:blogger.com,1999:blog-5222542250352397862.post3185683537973373653..comments2018-12-19T01:04:41.462-08:00Comments on Stuff Gil Says: What sort of allocation rates can servers handle?Gil Tenehttp://www.blogger.com/profile/10732691137498021997noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-5222542250352397862.post-52151611451962701952015-09-02T08:59:51.666-07:002015-09-02T08:59:51.666-07:00The approach of slicing larger processes into smal...The approach of slicing larger processes into smaller ones in order to reduce the max pause time seen in each is certainly common. It&#39;s what I call the &quot;640KB&quot; design style (which is more like 640MB these days, but same concept). The approach certainly works at some levels, with enough &quot;duct tape&quot; to keep things working. It&#39;s a good example of wasted engineering effort and inefficient design driven by the need to work around around a single problem: pause time that are too large to be acceptable.<br /><br />As to what scale the approach is efficient to, this varies dramatically. E.g. in the second tier of distributed cache systems, you can probably see N go to the low tens (within a physical system). But in most actual applications (you mention tomcat as a container), this approach usually caps out with N in the single digits (on a single system) before getting into problems. Hosting tens of JVMs that are mostly idle, or tens of JVMs that all carry nearly-identical work patterns seems to work on a single System. But going to tens of JVMs that are all active and all have disparate timing and working sets tends to lead to thrashing. A single active JVM is a hungry thing (especially when it is busy doing a multi-second GC). And tens of these hungry processes don&#39;t tend to make good neighbors.<br /><br />When any form of caching is involved, the inefficiency of splitting and replicating the cached data comes into play pretty early, too. You often end up either with a smaller cache per instance (most common) which results in higher miss rates in the instance-local cache and higher actual work that needs to be done as a result. Or you end up replicating the cache in each process (whether it&#39;s kept coherent or not), which leads to dramatically increased GC work in the system as a whole and still keeps GC pauses in each instance high.<br /><br />To the question about mainstream JVMs that are capable of releasing unused heap back to the OS. There is one that does this very well: Zing. It is completely elastic, with all pages (above a dedicated level) released back to the OS immediately as they are collected. Other HotSpot variants will also dynamically adjust their heap (down to Xms) if there is no pressure on it. But this much more slowly adjusting behavior is delayed, and only occurs when the JVM load or working set drops on the individual JVM dramatically for a long period of time (multiple oldgen cycles). Slowly releasing memory when idle doesn&#39;t do much for you when all those N JVMs are actually active at the same time, which would be an inherent behavior if they are just split-up portions of what would otherwise be a single active instance.<br /><br />As to the notion that it will take a long time to get Zing into your production data center: I&#39;m often surprised at the sort of re-engineering people do aim at their production datacenter deployments (like splitting their processes up into lots of small pieces, with all the disruptive changes that entails) when the alternative is much simpler and easier to get through even the most rigorous testing and re-qualification processes. Lots of people use Zing in production datacenters, either in place of HotSpot or side by side (some apps use Zing, other use HotSpot). Shifting to using Zing is invariably easier to convince your datacenter folks to do than a redesign and re-architecture of the deployment of your nice working (except for pauses) application that would increase process counts by 10x for the same workload (and the rigorous testing under varying load conditions that would be needed to study the edges of the new load-driven failure modes and load-bearing behaviors that sort of fine-grain splitting creates).<br />Gil Tenehttps://www.blogger.com/profile/15042659998856041778noreply@blogger.comtag:blogger.com,1999:blog-5222542250352397862.post-36050714711707497922015-06-02T06:42:24.561-07:002015-06-02T06:42:24.561-07:00Hi, just got here after listening to Your brillian...Hi, just got here after listening to Your brilliant &quot;Understanding Java GC&quot; presentation from 2012 (thank You a TON for compiling and sharing it!!)<br /><br />As much as I&#39;d like us to try Zing out - it&#39;d take lots of time before it reaches production datacenter,<br />and for short-term - I don&#39;t think the miracle of plugglable-GCs exists in JVMs (to use C4 instead of CMS).<br /><br />Since our throughput had grown - we&#39;re facing &gt;100sec GC pauses, but RAM isn&#39;t over-utilized at all (live data is less than 5GB).<br />To mitigate the &quot;length of GC pause exceeds human/software timeouts&quot; - I&#39;m planning to try a bit counter-intuitive approach:<br />- REDUCE the heap_size 10x times<br />- deploy 10x tomcat instances<br />- reduce live_set_size &quot;almost 10x&quot; times (balancer routes calls per user-hash, so we can reduce user-obj caches 10x, though static/config caches will remain same)<br /><br />If understood Your math corrcetly - this will cause each instance to have:<br />- almost-same frequency of fullGCs (allocation_rate/10, heap_size/10, live_set_size/almost_10)<br />- 10x shorter duration of each fullGC (heap_size/10)<br />And overall CPU use should remain same (same load spread into smaller buckets).<br /><br />I understand the approach is not that efficient in comparison to C4, but still effective up to some NNN-times scale (100x?), with following limitations:<br />1) at some scale the need to keep NNN copies of static/config caches (live set) should skew the &quot;almost-NNNx&quot; math into impractical<br />2) non-uniformness of user-data (large users) will start to suffer from walls of too-small-buckets<br /><br />Just curious if any mainstream JVM is capable of releasing unused heap (above -Xms level) back to OS? (this would&#39;ve addressed the #2 above)Vlad Dhttps://www.blogger.com/profile/03407798412599971081noreply@blogger.com