Node.js is a server-side JavaScript platform "for easily building fast, scalable network applications". It's built on Google's V8 JavaScript engine and uses an (almost) entirely async event-driven processing model, running in a single thread. If you're new to Node and your reaction is "why would I want to run JavaScript on the server side?", this is the headline answer: in 150 lines of JavaScript you can build a Node.js app which works as an accelerator for WCF REST services*. It can double your messages-per-second throughput, halve your CPU workload and use one-fifth of the memory footprint, compared to the WCF services direct.

Well, it can if: 1) your WCF services are first-class HTTP citizens, honouring client cache ETag headers in request and response; 2) your services do a reasonable amount of work to build a response; 3) your data is read more often than it's written. In one of my projects I have a set of REST services in WCF which deal with data that only gets updated weekly, but which can be read hundreds of times an hour. The services issue ETags and will return a 304 if the client sends a request with the current ETag, which means in the most common scenario the client uses its local cached copy. But when the weekly update happens, then all the client caches are invalidated and they all need the same new data. Then the service will get hundreds of requests with old ETags, and they go through the full service stack to build the same response for each, taking up threads and processing time. Part of that processing means going off to a database on a separate cloud, which introduces more latency and downtime potential.

We can use ASP.NET output caching with WCF to solve the repeated processing problem, but the server will still be thread-bound on incoming requests, and to get the current ETags reliably needs a database call per request. The accelerator solves that by running as a proxy - all client calls come into the proxy, and the proxy routes calls to the underlying REST service. We could use Node as a straight passthrough proxy and expect some benefit, as the server would be less thread-bound, but we would still have one WCF and one database call per proxy call. But add some smart caching logic to the proxy, and share ETags between Node and WCF (so the proxy doesn't even need to call the servcie to get the current ETag), and the underlying service will only be invoked when data has changed, and then only once - all subsequent client requests will be served from the proxy cache.

The code is very simple. The Node proxy runs on port 8010 and all client requests target the proxy. If the client request has an ETag header then the proxy looks up the ETag in the tag cache to see if it is current - the sample uses memcached to share ETags between .NET and Node. If the ETag from the client matches the current server tag, the proxy sends a 304 response with an empty body to the client, telling it to use its own cached version of the data. If the ETag from the client is stale, the proxy looks for a local cached version of the response, checking for a file named after the current ETag. If that file exists, its contents are returned to the client as the body in a 200 response, which includes the current ETag in the header. If the proxy does not have a local cached file for the service response, it calls the service, and writes the WCF response to the local cache file, and to the body of a 200 response for the client. So the WCF service is only troubled if both client and proxy have stale (or no) caches.

The only (vaguely) clever bit in the sample is using the ETag cache, so the proxy can serve cached requests without any communication with the underlying service, which it does completely generically, so the proxy has no notion of what it is serving or what the services it proxies are doing. The relative path from the URL is used as the lookup key, so there's no shared key-generation logic between .NET and Node, and when WCF stores a tag it also stores the "read" URL against the ETag so it can be used for a reverse lookup, e.g:

Key

Value

/WcfSampleService/PersonService.svc/rest/fetch/3

"28cd4796-76b8-451b-adfd-75cb50a50fa6"

"28cd4796-76b8-451b-adfd-75cb50a50fa6"

/WcfSampleService/PersonService.svc/rest/fetch/3

In Node we read the cache using the incoming URL path as the key and we know that "28cd4796-76b8-451b-adfd-75cb50a50fa6" is the current ETag; we look for a local cached response in /caches/28cd4796-76b8-451b-adfd-75cb50a50fa6.body (and the corresponding .header file which contains the original service response headers, so the proxy response is exactly the same as the underlying service). When the data is updated, we need to invalidate the ETag cache – which is why we need the reverse lookup in the cache. In the WCF update service, we don't need to know the URL of the related read service - we fetch the entity from the database, do a reverse lookup on the tag cache using the old ETag to get the read URL, update the new ETag against the URL, store the new reverse lookup and delete the old one.

Running Apache Bench against the two endpoints gives the headline performance comparison. Making 1000 requests with concurrency of 100, and not sending any ETag headers in the requests, with the Node proxy I get 102 requests handled per second, average response time of 975 milliseconds with 90% of responses served within 850 milliseconds; going direct to WCF with the same parameters, I get 53 requests handled per second, mean response time of 1853 milliseconds, with 90% of response served within 3260 milliseconds. Informally monitoring server usage during the tests, Node maxed at 20% CPU and 20Mb memory; IIS maxed at 60% CPU and 100Mb memory.

Note that the sample WCF service does a database read and sleeps for 250 milliseconds to simulate a moderate processing load, so this is *not* a baseline Node-vs-WCF comparison, but for similar scenarios where the service call is expensive but applicable to numerous clients for a long timespan, the performance boost from the accelerator is considerable.

* - actually, the accelerator will work nicely for any HTTP request, where the URL (path + querystring) uniquely identifies a resource. In the sample, there is an assumption that the ETag is a GUID wrapped in double-quotes (e.g. "28cd4796-76b8-451b-adfd-75cb50a50fa6") – which is the default for WCF services. I use that assumption to name the cache files uniquely, but it is a trivial change to adapt to other ETag formats.
Posted on Wednesday, April 4, 2012 12:12 AM
WCF
,
github
| Back to top

So you're telling me that, having spent years keeping up with the microsoft stack, learning how to optimise it, abiding by its constraints and sticking to IIS, http.sys, ISAPI, ASP.NET, WCF and so on ad nauseam ad infinitum, up pops node.js and it is the best thing since sliced bread? I mean why bother using any of the .NET platforms at all?And it's server-side javascript?So the whole Microsoft approach has been a bag of balls and the future is this kind of approach, circumvent everything in WCF etc etc and use node.js.

Nice post Elton, I'm interested to know why you didn't look at AppFabric Cache for this solution, pre-loading of the cache could have been complete once a week when you uploaded your DB. I really like what you have done here and its a great explanation but I can't help but find the solution to be convoluted, and I would be shot down if I proposed it as an architecture to a client (no offense). As James D states - you can't argue with the facts and figures of Node.js. If the solution was a Node.js front end service and memcache then do we really need the WCF layer for DB reads? Thanks

James D - Node.js does play in the Microsoft stack, so you can host it in IIS and run natively with it in Azure. Using it as an accelerator means it's very thin front end to existing WCF services. You get the big performance benefits, but you keep all your complex stuff in .NET, which is where it belongs, having far richer functionality and tooling.

CJB - I'm not sure the solution is particularly convoluted, it doesn't add more complication than an out-of-the-box accelerator would. And being shot down would depend on the client; enterprise clients who are MS-only wouldn't want to touch it, but smaller budget-focused clients don't tend to care so much about the technology as about the cost. AppFabric cache is an option if you're scaling out, but in the case where you have an accelerator sat on every server, then memcached saves you network latency.

I've worked on WCF solutions that were optimized but still struggled to better 100 concurrent requests per second per server (on better hardware than my dev laptop...). Adding an accelerator if the scenario benefits could be much more attractive than scaling up by another server for every extra 100 concurrent requests you want to support.

And I wouldn't abandon the MS stack just yet... But a lot of the exciting technologies at the moment (node, AMQP, HTML5) are platform-independent *and* heavily MS-supported. Having them in your toolkit will give you some more interesting delivery options. Example - do the heavy lifting in WCF hosted in Azure, and get the scalability by running a Node accelerator on a bunch of AWS Linux instances. Or use Node as your REST front end, queuing async calls to your Azure service via RabbitMQ - etc. etc.

Hi, really good post. I have been considering using biztalk and node.js but feel like i haven't done enough research. I need biztalk because our supply chain consists of too many systems that keeps getting changed or upgraded. However we wish to provide the customers with a uniform front end even if they interface with multiple systems. We are predominantly a Microsoft addict organisation but I have found some leverage to migrate to any solution that keeps our customers happy. What do you think the best solution would be? I would have to use WCF to run the integration but don't have much experience in this area so not sure what the method would be.