Chris Lee's WebLogtag:typepad.com,2003:weblog-833282005-02-11T16:58:49-08:00TypePadYour ISP as an advocate for quality of servicetag:typepad.com,2003:post-35399712005-02-11T16:58:49-08:002005-02-11T16:58:49-08:00Winding down from an adrenaline high after spending the better part of 50 hours troubleshooting our ISPs network for them... The problems - technical perspective: Routing decisions on the internet have a financial component - often, the lowest cost route...Chris Lee
<div xmlns="http://www.w3.org/1999/xhtml"><p>Winding down from an adrenaline high after spending the better part of 50 hours troubleshooting our ISPs network for them...</p>
<p>The problems - technical perspective:</p>
<ol><li>Routing decisions on the internet have a financial component - often, the lowest cost route is chosen (that's lowest cost to the carrier, not you!)</li>
<li>Internet routing is brain-dead when it comes to performance; the quality of the route is not a significant&nbsp; variable in routing decisions, so long as the route is 'alive' (or still warm...)</li>
<li>There are no ubiquitous quality-of-service standards - especially across carriers.</li>
<li>The internet as a whole lacks diagnostic (and self-correcting) capabilities.</li></ol>
<p>The problems - business perspective:</p>
<ol><li>The internet is becoming an increasing critical service to businesses worldwide, yet little is invested in addressing quality concerns;</li>
<li>Far too many people - including ISPs (big and small) - have a laissez faire attitude to internet problems.&nbsp; The phrase &quot;it's just the internet&quot; allows ISPs to delivery shoddy service - imagine if your pilot said &quot;it's just the engine&quot;.&nbsp; And ISPs aren't the only ones to blame - it saddens me that most people take this poor service for granted.</li>
<li>The internal culture of ISPs reflects the above point - I was actually told yesterday, by a major backbone ISP, that &quot;that's just the way the internet is&quot;.&nbsp; After inquiring as the escalation status of the issue, I was informed that the VP of Operations is aware of the problem.&nbsp; When I asked &quot;what is he doing about it?&quot;, the response was less than encouraging - &quot;we have to trust in our employees evaluation of the situation&quot; - what kind of escalation is this?&nbsp; How does this help the problem get resolved?</li>
<li>ISPs internal support protocols are flawed; they are self-serving, missing the target of resolving the problem.&nbsp; The image of &quot;trained monkeys&quot; came to mind during this exercise - they can do A, B, C, D - and if you don't gift wrap it and hand it to them on a silver platter (i.e. solve the problem for them) they can't deal with it.&nbsp; What happened to &quot;taking a problem and running with it&quot;?&nbsp; &nbsp;Perhaps if there was more emphasis on quality there would be fewer support incidents, reducing the number of support staff and allowing for a higher calibre of support (a viscious cycle...).</li></ol>
<p>So what is the solution?&nbsp; Well, you could have multiple ISPs to mitigate the problem - but this seems to me to be spending money to solve a problem that shouldn't exist in the first place.&nbsp; Unfortunately, if you require a certain level of quality you are forced into this.</p>
<p>No, the real solution is to ensure your ISP feels the pain of poor quality every time it occurs.&nbsp; If you have a capable ISP, they will be an advocate for you with downstream problems.</p>
<p>Make sure that you:</p>
<ol><li>Have a well defined escalation procedure, preferably worked out as part of the initial agreement;</li>
<li>Have an SLA in place to cover the oft-neglefted dimensions of performance and incident response time, in addition to availability.</li>
<li>Work with other customers of the ISP - there is power in numbers;</li>
<li>Don't by shy about keeping quality of service front and center - it should be always be the primary goal.</li>
<li>Stay away from shoddy service providers that claim to be Tier 1; they may have backbones, but can they keep them up and running?</li></ol>
<p>Additionally, it would be nice to see the internet quality of service regulated - perhaps to the point where the Internet is classified as an &quot;essential service&quot;.&nbsp; The standards and enforcement associated with this will weed out the (far too many) deadbeat ISPs.</p></div>
Announcing Cheetahtag:typepad.com,2003:post-33428502005-01-24T19:22:00-08:002005-01-24T19:22:00-08:00Well, since no one else is up to the challenge of creating an open-source NIO servlet container, we've launched the Cheetah HTTP Engine project. The primary goal of this project is to provide a high-performance, scalable connector for the Tomcat...Chris Lee
<div xmlns="http://www.w3.org/1999/xhtml"><p>Well, since no one else is up to the challenge of creating an open-source NIO servlet container, we've launched the <a href="http://cheetahweb.sourceforge.net">Cheetah HTTP Engine</a> project.&nbsp; </p>
<p>The primary goal of this project is to provide a high-performance, scalable connector for the Tomcat servlet container; the current implementation (still pre-alpha) provides support for NIO (Java 5+).</p>
<p>Just finished implementing &quot;pluggable strategies&quot; for tuning concurrency aspects around NIO; for example, one can choose to have dedicated threads to accept connections.&nbsp; This is an important aspect of Cheetah, as cross-platform support for NIO often fails to address performance semantics.&nbsp; Other pluggable strategies include: single or multiple selectors; pooled or synchronous event dispatching (for those interested in low concurrency, whoever you may be...).</p><br /></div>
Tomcat & NIOtag:typepad.com,2003:post-28451512004-11-28T09:31:50-08:002004-11-28T09:31:50-08:00So how come Tomcat doesn't use NIO? At first glance, the use of NIO seems an obvious choice. After digging further, the water becomes quite muddied... Threads do use system resources. Modern versions of Java map each thread 1:1 onto...Chris Lee
<div xmlns="http://www.w3.org/1999/xhtml"><p>So how come Tomcat doesn't use NIO?&nbsp; </p>
<p>At first glance, the use of NIO seems an obvious choice.&nbsp; After digging further, the water becomes quite muddied...</p>
<ol><li>Threads do use system resources.&nbsp; Modern versions of Java map each thread 1:1 onto an OS level trhread (or LWP - light weight process - for Solaris).&nbsp; What are the implications of this?&nbsp; Two major factors: a) context switching between threads and b) per-thread stack size cuts into available JVM memory.</li></ol>
<p>So how serious are these problems?&nbsp; For a light to moderately loaded Tomcat server, these are not major concerns.&nbsp; When you start talking about many hundred of concurrent (by concurrent I mean truly concurrent - active connections), these factors can degrade performance (at best) or cause system outages (at worst).</p>
<p>Consider the following scenario (this is based on a recent real-world experience):</p>
<p>a) A Full GC runs - say it takes several seconds;<br />b) Incoming connections are 'paused' as the Full GC completes;<br />c) Once the Full GC is complete, the incoming connections spawn new threads (assumes the existing threads are all connected - not necessarily busy);<br />d) The new threads each allocatememory for their stack; on a Solaris server, the default stack (for 32 bit mode) is 512k (for 64 bit mode it is 1MB).<br />e) If the memory to be allocated is not readily available, a Full GC will be performed to find the memory; </p>
<p>You can see the cyclic / cascade effect this will have - in our real-world situation, this caused the server to eventually do nothing but Full GCs.</p>
<p>So how do we work around this?&nbsp; </p>
<p>a) Reduce stack size; 128k seems to work quite well with JBoss/Tomcat, though the exact size depends on the nature of your application.<br />b) Limit the maximum number of concurrent threads; this means that some connections cannot be established, but the server will stay running;<br />c) Tune garbage collection to reduce pauses; have had good luck with the concurrent mark-sweep GC with a 2GB heap.&nbsp; </p>
<p>Note that these are only work-arounds - not happy with the fragility of the system under the noted circumstances.</p>
<p>The second point - context switching - is a minor concern.&nbsp; This will certainly cause degraded performance, but as to whether the degradation is evident / material in delivery speed is debatable.&nbsp; Certainly at extreme system load this will cause a problem - but if you routinely run your systems at/near capacity, your are asking for problems anyway.</p>
<p>&nbsp; &nbsp;&nbsp; &nbsp; 2.&nbsp; Integration of NIO and Tomcat /Servlet API seems challenging at best.</p>
<p>The event-driven model of NIO seems to be at odds with the blocking model that the Servlet API and Tomcat implementation depend on.</p>
<p>&nbsp; &nbsp;&nbsp; &nbsp; 3.&nbsp; Other vendors use forms of non-blocking IO! Weblogic, for example, has had native performance packs for years that use poll() and a small number of threads to read HTTP requests.&nbsp; Recent versions (8.1) contain a NIO multiplexer as well.</p>
<p>Additionally, there are add-on products - such as <a href="http://842technology.com/">Engine/J</a> - that add NIO capability to Tomcat.</p>
<p>I can't speak for the effectiveness of Tomcat add-ons, but have had a lot of success with WebLogic in the past.</p>
<p>See <a href="http://www-106.ibm.com/developerworks/library/j-nioserver/">IBM - Servlet API and NIO</a> for a good article on how this can be accomplished.</p>
<p></p>
<p>So where does this leave us?&nbsp; Certainly it would be challenging to retrofit Tomcat to use NIO - but clearly not impossible.&nbsp; I believe that adding NIO support to Tomcat would make it a true contender in the enterprise arena.</p>
<p>&nbsp;</p></div>