August 8, 2016

What if I turn it up to eleven?

This is Spinal Tap is a classic in a lot of ways, but one thing the fictional band knew was how to turn up the volume.

Have you cranked up your load testing? In my last large project, there were several critical components that had to be load tested individually and as part of the overall system. We all know that it just takes one bottleneck to spoil an otherwise perfectly good transaction. In our case, there were so many components and handoffs inside of a single synchronous transaction that one bottleneck could easily kill the entire system. But where were they?
To ensure we tackled each one, we first had to model all the components of the user transaction from top to bottom. Each sub-transaction needed to be instrumented and measured to help identify those bottlenecks. And there were plenty. The transaction volume (i.e. "load" per time interval for this discussion) is estimated at a "busy hour" based on estimates or actual production data. Then all the different scenarios need to be played out, using "what if" brainstorming.
What if a busy hour hits and a heavy batch process is still running? What if one or more of the servers in the cluster fail? What if a database query is constraining the SAN? What if everyone tries to log in before the data reorg is complete? You know the drill, the scenario brainstorm is the fun part. Remember, it's not necessarily the obvious set of transactions that are the killers. The mixing and matching of transactions is critical too.Then it comes down to building the scripts that let you pull the levers to match the scenarios. Don't skimp here. Your test environment should match production as closely as possible. Otherwise, while you'll certainly discover bottlenecks, they won't be the same ones in production. I've had the experience that the negative impact on a transaction can be severe when a production system performs faster than the test system. Because the bottleneck has moved. And a transaction component ends up waiting in line for something completely different -- an outcome that didn't exist in the test environment.After the tests are documented, the analysis starts. In those scenarios, does the system perform within the SLA? Sometimes you have to answer the question, does it need to? Does your SLA dictate a busy hour response? Or overall average? Smart negotiators will agree on SLAs that include volume estimates as well as performance metrics. No one can guarantee response time for an unknown quantity.So, while functional testing is important, you also need to know how your system will perform under load scenarios as well. A properly constructed load test framework will pay dividends to indicate impact with any change to the system, especially upgrades. Crank it up to eleven!Originally published on LinkedIn