To gather insights on the current and future state of Performance Testing and Tuning, we talked to 14 executives involved in performance testing and tuning. We asked them,"What are the technical solutions you use for performance testing and tuning?" Here's what they told us:

Open Source

We develop a lot of open source tools for developers to use. Toolkits are easy to use for summaries and queries. There are a lot of good reference books for learning and training. Monitoring and management with GUI. If you look at the design of a database, no tool tells you what schema to build for a particular (e.g., e-commerce, financial, healthcare) application. Use best practices. Run Explain for new queries to see how they can be optimized.

There are a lot of items to check in a test and tuning phase. Everything from logging, monitoring, alerting, instrumenting, profiling and ultimately testing need to be covered. We use a lot of open source products such as JMeter or Gatling to simulate workload. As a load testing platform, we have the luxury of being able to eat our own dog food, which helps us identify our own performance bottlenecks or issues in production. For tuning, we tend to vary between commercial platforms to assist with identification and analysis of defects, but we also use tools which are native to the system we are working on.

Other

Think what you can shift left for quality and performance. Can you automate the real conditions users will see? Run tests in the lab for your benchmark but build capabilities in the lab to replicate the real life of different personas (e.g. frequent airline traveler). Provide a “wind tunnel” test which measures the responsiveness of the apps. Enable logs (HAR files) to be handed off to developers with the information on what needs to be fixed, what was downloaded from the network and the ability to see if it was downloaded incorrectly.

We use a combination of active synthetic monitoring (HTTP, VOIP) plus network related passive and active monitoring. Active is associated with the application layer. Add passive monitoring of the internet. Active monitoring of network paths or topology. Network monitoring is typically passive data. Passive monitoring of network infrastructure means you have to own the infrastructure. Organizations must shift to active because they own less infrastructure. To understand what makes up the UX you need to do active monitoring.

Mainly Splunk log monitoring and alerting.

Our average customer has 3.2 testing solutions. We provide Rest APIs so we’re able to integrate with everything. Open XML, Jenkins is everywhere, there’s some Bamboo. We see Chef and Puppet in the DevOps pipeline but only 15%, others are using homegrown solutions. 75% of our clients are on their DevOps journey but only 10% are fully implementing a DevOps methodology.

During the development of the product, automation of reproducible workloads is the most important aspect of testing. A battery of tests is executed on each beta release but we also collect data on a limited subset of tests on a more frequent basis. These are executed on a range of machines and where regressions are found, a deeper analysis is conducted. QA conduct independent tests on a defined schedule as the development team may halt continual testing to debug a specific problem. My team's priority is to resolve regressions as they are found whereas QA's priority is to identify as many regressions that exist as possible and report them. In terms of the tools used for analysis, it depends on the workload. Usually, the first step is to identify what resources are most important for a workload and then use monitors for that area. For example, a CPU-bound workload may begin with a high-level view using top, an analysis of individual CPU usage using mpstat and an analysis of CPU frequency usage and C-state residency using turbostat. It does not stop there – for latency issues we may use information from /proc/ to get a high-level view of how long workloads take to be scheduled on a CPU and ftrace to get a more detailed view of the chain of events involved when waking a thread to run on a CPU. To get an idea of where time is being spent, we would use perf but on occasion, we'd also use perf to determine how, when and why a workload is not running on a CPU. Depending on the situation, ftrace may be a more appropriate solution for answering that question. The tools used vary depending on the resources that are most important to a workload. IO-intensive workloads may start with iostat but can require blktrace in some instances. We avoid enabling all monitors in all situations as excessive monitoring can disrupt the workload and mask problems. Using the data, we then form a hypothesis as to why performance may be low, propose potential solutions and then validate them in a loop until the desired performance is achieved.