Here at Capacitas we are often asked by clients of the feasibility of validating the scalability of a service by running a performance test against the live (production) service. Having delivered testing against production a number of times, I’d like to present the advantages and disadvantages of this approach.
- Test environments may not have the exact same configuration as production and thus may produce misleading test results
- This is a common issue, especially with services based on complex infrastructure and software
- So testing against production ensures you are testing against a valid configuration
- Testing against production provides an environment at full capacity as opposed to a scaled-down test environment
- In turn this allows you to test to high levels of concurrency, which may not be possible in a scaled-down environment
- If you don’t need a test environment you avoid all of the associated costs; hardware, licencing, personnel, etc.
- Testing against production allows you to avail of ‘pay as you go’ load injection test tools and services
- Testing against production ensures you test the entire end-to-end technology stack including network access points, firewalls, load balancers etc.
- Code is already live so performance issues may be causing problems to real users already!
- Testing must be conducted out of hours during non-peak periods.
- This precludes the option of soak testing where we observe the performance over a prolonged period to detect issues that manifest themselves over time
- Clearly there will be limited opportunity in services serving a global user base, e.g. bank trade processing systems
- Often system maintenance (backups, defragmentation, etc) takes place out of hours, distorting the observations on the performance of the system
- The test window offered is typically very narrow as the disruption to the business must be minimised.
- Invariably tests fail due to unexpected test script or data errors.
- So this leaves little time to rerun tests when failures occur
- Real users may experience degraded service performance while the test runs!
- You cannot impact the integrity of production data
- The most obvious example is the purchase step on an e-commerce service. This is business-critical and typically the most capacity intensive
- It is not pragmatic to generate a large number of purchases during a performance test and subsequently remove them from databases once the testing is complete
- Another example; there may be resistance to creating thousands of test accounts on a production system for the purposes of testing
- Misleading results are sometimes observed with cloud injection from a small source of client IP addresses
- I’ve seen cases where demand is concentrated on a subset of the server farm due to affinity settings
- May be less instrumentation available to turn on in live making it more difficult to diagnose issues
What is the best approach to take?
Well it is dependent on the following factors:
- Size of testing budgets
- Width of available production test windows
- The risk to service performance during test execution
- Service workload profile
- Service complexity
Discover more about improving performance during trading peaks, - download the ebook here.