Company

Events and Information

TCP: Where the Network Meets the Application, Part 1

This post is written by Eric Thomas, Principal Solutions Architect at ExtraHop Networks.

Many organizations struggle to find the right level of instrumentation to monitor the performance of their business-critical applications. It's often a compromise: on one hand, you expect a certain level of detail or granularity, and on the other, you know that you will incur some level of overhead—whether in system resource consumption, time spent managing your instrumentation, or both.

Risks of Agent-Based Monitoring Tools

Agent-based approaches can provide a level of detail that is, at first blush, very impressive. You can literally tally up the processing time and memory consumption of every single line of code in the application!

The cost of that detail, though, is in the CPU cycles required to gather the data and—perhaps more importantly—in the introduction of a new and potentially unstable layer to your application delivery stack. Bugs or misconfigurations in the instrumentation can cause unpredictable behavior, such as abnormally high CPU loads or even application crashes. You'll know you've fallen into this particular pit when you're spending more time managing your monitoring tool than managing your application—or when your monitoring tool takes down your production infrastructure during peak load.

Passive end-user experience (EUE) monitors such as Tealeaf from IBM and Coradiant from BMC Software offer a no-impact approach to HTTP monitoring. While we give them due credit for leaving production systems undisturbed, they fall short in the level of detail offered and in many cases raise more questions than they answer. Was that bad request the result of an under-provisioned web stack, or was the database to blame? What about storage, or the network, or the myriad other dependencies in the chain? Knowing that a particular transaction is slow is of limited utility if you can't explain why.

Let's indulge a little thought experiment and try to find the optimum approach for application performance monitoring. First, our ideal instrumentation would induce no extra overhead to the transaction. (This goal seems obvious as monitoring should be lower priority than actually delivering valuable content to customers—but hey, software vendors have been getting this detail wrong for years.) Our imaginary monitoring system would also require very little maintenance—ideally no maintenance at all. Finally (and this is the most important bit), our perfect system should give us actionable intelligence about every dependency across every transaction.

What about a dedicated application management layer transparent to both applications and the networks that deliver them? What if we could wrap every transaction in a seamless framework that provides insight into the entire application delivery stack with no impact on the application?

We would insert this management layer between the host operating system and the network infrastructure. From this unique vantage point, we could simultaneously see down to the lowest levels of the network and up into the internals of the application code. We could find out when a physical host is running low on processing resources. We could see when a core switch or router is dropping packets. We could expose database deadlocks, code-level exceptions, and OS misconfiguration.

While we're at it, what if this management layer actually improved application delivery? What if it automatically adjusted to congested networks, for example, or compensated for underpowered servers? What if we could guarantee delivery of every one of our transactions?

And here's the best part: this magical framework would incur no overhead, because your applications are already using it in production. No, it's not ExtraHop's latest addition to the product line. It's the Transmission Control Protocol, or TCP.

A Healthy Obsession With Transmission Control Protocol (TCP)

TCP is where the application meets the network, and the interaction between the two allows us to answer two fundamental questions about application performance:

How well is the network delivering the application?

How well is the application using the network?

We will explore this concept further in Part 2 of this two-part series when we examine TCP measurements in detail, including key TCP metrics Retransmissions, Retransmission Timeouts (RTOs), Round-Trip Time (RTT), Aborts, Throttling, and Zero Windows. We'll also share some war stories from the field, such as how TCP Zero Windows pointed to an application architecture that guaranteed database deadlocks. Please join us and see what dramatic insights you can gather from this magical but not-at-all imaginary performance monitoring framework!