Top 5 tips for running a production load test

More and more organisations who have applications/websites with a large user base are running load tests against their production systems, while others prefer to test in their test environments to avoid any impact to their end user or the production infrastructure.

Running tests in production is as important as running them in the test environment. A large number of defects are usually found in production that are not found in pre-release testing. The reasons for not finding these defects in the test environments include:

Scaled down test environments
Missing components in test environment, e.g. Load Balancer, interface to third parties, etc.
Different application configuration
Lower levels of concurrency

It is not always straightforward to run a production load test, as there are many factors to be taken into consideration.

However, here at Capacitas we have a great record of accomplishment using production load testing to deliver Service Assurance.

In this article, we share our top ten elements for a successful production load test.

1. Test Time Window

Running a load test on a production system can affect the real end users interacting with the application because the infrastructure, (from the servers to the network), will be more utilised than usual.

Therefore, we recommend running production load tests when real user activity is very low. Running the load test at a quieter time will ensure that fewer users are affected by any slow response.

However, note that good monitoring should be in place so the test load can be reduced if a slowdown is detected, (see point 3).

2. Workload Model

The user behaviour in production has to be captured so that the test design and throughput can be as realistic as possible for the load test. The transaction rates in production can be easily obtained by looking at various data sources such as application logs, or transaction analytics.

This data will help in understanding the parts of the application that are more heavily utilised and therefore need to have more load sent to them. If testing for peaks, then the peak throughput should be targeted in the test and any business growth factor should be included in the model forecast.

3. Monitoring

Monitoring is one of the most important tasks for a production load test, as all the data being monitored is useful both during and after the test. It is advisable to capture as many metrics as possible, as they could come in handy when investigating issues. Some of the key areas to monitor are:

End user experience – response time & application errors
Infrastructure – server utilisation & network utilisation
Application – system threads & memory consumption

It is critical to keep an eye on the monitoring during the test in case the application under test has problems, and requires the test to be stopped. Consider this chart:

It is easy to see response times rising with each stage of the test, but they are kept safely below the level where users will notice the degradation.

The monitoring requirements should be decided well in advance of the production load test, as it is costly to run one. If any data is missing then it might not be straightforward to run another test and recreate similar issues.

4. Load Distribution

Applications that have large user traffic, (e.g. several thousand concurrent users during busy periods), will require similar or more test users during the production load test. In order to run these high volume load tests you will need to have several load injectors running at the same time.

If a CDN, (Content Delivery Network), is being tested as well then it is a good idea to distribute the load injectors in different regions, this will ensure that the test does not overload the edge servers.

It is also beneficial to distribute the test scripts on the load injectors evenly. This will prevent any particular load injector to be constrained by any of the computing resources due to high script activity.

5. Maintenance Jobs

Production load tests tend to happen during quieter times, which is also a time when many organisations run their housekeeping tasks. For example, application restarts, database indexing, and so on. This has to be taken into consideration when running the tests, as housekeeping jobs will most definitely have an impact on the capacity and performance of the system.

It is best if these maintenance tasks can be suspended during the window when the test is running and enabled again once all the testing is complete.

If you would like to learn more about our Prepare for Peak and Performance testing solutions, please click below, to see our latest Ebook.

Insights