A Guide to Cloud Cost Optimisation in AWS

Introduction

The global public cloud market will hit $178 billion this year, up from $146 billion in 2017 (Forrester Research), and public cloud adoption in enterprise is expected to exceed 50 percent for the first time this year. This whitepaper will refer specifically to AWS as they are by some way the leading provider of public cloud, however the principles and approaches apply equally to all public cloud solutions.

Innovating at breakneck speed and driving the pace of change through multiple major new releases each year, AWS have a range of products and services spanning not only infrastructure and storage, but also databases, container and serverless technologies, AI/ML, IoT and more. This breadth, depth and pace of offering brings enormous benefit to enterprise organisations in driving forward transformation, but also significant challenges.

Adopters of AWS need to change often ‘hard-wired’ internal processes (forcing organisational change) both in how software is designed, built and run but also in how infrastructure is provisioned, managed and paid for. They must adopt different skills and ways of thinking across the organisation (requiring new resource and staff training), and adapt to new procurement and commercial models (i.e. pay as you go, OPEX not CAPEX).

This whitepaper deals specifically with the challenges organisations face in managing and optimising AWS costs. Our aim in writing this is to share best practices for the most efficient and cost-effective ways to run AWS. The principles and methodologies we outline are intended to reinforce and complement AWS’s own best practice guidance, detailed in their Well-Architected Framework.

Key Takeaways

In this whitepaper we will:

Record the experiences and insights gained from optimising cloud costs with our clients
Provide a set of recommendations and best-practices for cloud cost optimisation

The following are required in order to take advantage of the flexibility of the cloud to enable cost savings:

Measurement and understanding of the efficiency of your systems and knowing what good looks like. NB. efficiency is more than just using rightsizing, autoscaling or other cloud technologies
A process to deliver ongoing rightsizing into live without
service risk
The ability to remove technical constraints to enable rightsizing
Ongoing validation of live performance to look for early
warning signs of risk by looking beyond just response times
and throughput
A deep understanding of workloads and their inter-dependencies in complex eco-systems

Drivers of Overspending in AWS

There are four common contributors to overspending in AWS:

Oversizing
Software Inefficiency
Application Inelasticity
Sub-Optimal Architecture

Bonus: Forecasting complexity, three demand drivers.

Oversizing

This is the most common reason for overspend and the simplest to solve. There are multiple factors that contribute to this, and they are often rooted in the design process:

Capacity added even when there is sufficient headroom
Inaccurate sizing due to weak performance testing methodologies
Inaccurate demand forecasts
Excess capacity put in place to compensate for software bottlenecks

The latter is a particular problem if this ‘temporary fix’ becomes a permanent solution.

Software Inefficiency

Software efficiency is one of Capacitas’ 7 Pillars of Performance. For transient resources, efficiency is defined as the amount of compute resource required per transaction and is a critical lever in the control of your cloud costs. This is particularly pertinent for high-volume systems. There is a similar calculation for persistent resources, e.g. storage.

Where does software inefficiency stem from?

Not measuring efficiency
Not knowing what ‘good’ efficiency looks like
Not having efficiency targets (non-functional requirements)
Other priorities on developers’ time

How important is this? In one client engagement, we identified (and worked with their developers to implement) a series of software optimisations which reduced their IT service opex costs from $3.3M to $0.3M per year.

Application Inelasticity

AWS autoscaling is a one of many great ways to control cloud cost by adjusting capacity to meet changing demand.

However, applications which are inefficient and/or require long warm up times do not auto scale quickly enough and typically are not able to scale to use all the available capacity. Applications which tend to be inelastic include databases, caches and inefficient applications. This leads to organisations having extra capacity headroom because they are not confident that their applications can scale up quickly enough to meet the demand.

In one example, a customer had an embedded practice to autoscale their systems at 50% CPU utilization. For one particular application, this resulted in $1M per year in unnecessary spend.

Sub-Optimal Architecture

Choosing a sub-optimal architecture for your workload will lead to higher cloud costs. We’ve seen this occur when teams are under time pressure to simply ‘lift and shift’ to the cloud. The typical examples of this are

Large amounts of unnecessary storage being ported over to the cloud
The use of on-demand instances for non-time critical jobs, e.g. batch, which could be run on cheaper compute such as spot instances
Large workloads moved to expensive premium or manged cloud services such as dynamodb or Cassandra where the additional performance provided by these solutions is not essential for the business or system requirement

How Can We Optimise AWS Costs?

Our high-level process for achieving cost optimisation is shown in Figure 2. The solution below focuses on addressing the cost inefficiency challenges 1 – 4 which account for 70%+ of the cost optimisation opportunities in the cloud.

Identify Over-Supply & Software Inefficiency

The first step is a diagnostic to identify two symptoms of high cost: over-supply and software inefficiency.

Over-supply is when the capacity provisioned exceeds demand over the IT service’s demand cycle. The concept of over-supply may be applied to any AWS component.

For example, in the case of EC2 instances we would employ standard measures of CPU, memory and disk. For serverless components, such as Lambda, we would use time as a measure of resource consumption; this isn’t a precise measure of efficiency but can be a useful proxy.

Once over-supply is identified, we need to qualify that capacity can be reduced safely. When capacity is reduced, the performance of the application should not degrade and there should definitely be no service incidents. In order to make this assessment, we model the expected performance, post down-sizing using our Seven Pillars of Software Performance (below).

This enables us to quantify the performance risk associated with reducing supply capacity and thus prioritise which systems should be addressed.

Software efficiency is defined as the amount of compute resource required to process an application request or transaction, where compute resource includes processor, memory, disk space or I/O (network and disk). How do we decide whether software is efficient or inefficient? The business-function of the software will determine its compute requirements. For example we would expect e-commerce software to have a lower processing footprint (per request) than encryption software.

A big-data analytics platform will have a larger memory footprint (per request) than a document management service, and so on. As enterprise cloud environments typically have tens or hundreds of thousands of servers and components, we use automated software to harvest the following data:

Cloud configuration
Supply and utilisation data
Demand data
Cost data

Capacitas uses a library of software efficiency benchmarks, built up over hundreds of customer engagements, and dimensioned by software type, to assess whether the measured compute cost is efficient or not. This is likely to be more difficult for non-specialists whose experience of measuring efficiency Is limited to handful of systems.

In order to get around that your organisation needs to build its own library of efficiency benchmarks; over a number of years it will become clearer what good looks like (Figure 3). Just remember to keep the older benchmarks up to date as technology changes.

Cloud cost monitoring tools such as Cloudability and Cloudyn provide great information on where there is over supply in the estate. However, they cannot identify software inefficiency as a driver of cloud capacity consumption.

As an output from this phase we will have a list of the candidate systems which could be downsized and the potential cost reduction opportunity.

Identify Optimisations and Associated Risk & Cost

The next stage is to quantify how much optimisation we can realistically achieve, given the constraints. What should our $ cost optimisation target be?

The goals of this phase are:

Define what right-sizing is required
Define architectural optimisations
Define what software efficiency improvements are required
Define what configuration change is required to increase application elasticity
Quantify the performance risk of changes [1-4]
Quantify the $ cost optimisation that changes
[1-4] will achieve

Once we have quantified the performance risk of the changes, we can build a picture of what changes can be realistically delivered, without impacting the operation and reliability of production services (Figure 5).

Cloud cost monitoring tools provide great information on about over supply in the estate. However, the key question remains: Can I safely reduce capacity without impacting the performance of the application? Unfortunately, these tools cannot inform this decision.

Plan Optimisation

In this phase we produce the detailed low-level designs for each type of optimisation:

Right-sizing
Architectural optimisations
Software efficiency improvements
Changes to increase application elasticity

Right Sizing

Right sizing needs to take into account:

The multiple dimensions of capacity planning
Performance characteristics of the workload
Business requirements
Upstream and downstream dependencies
Performance risk

The multiple dimensions of capacity are CPU, Memory, Network, Disk IOPS and Storage.

The performance characteristics of the workload includes long and short timeframe demands, e.g. synchronous, asynchronous or batch.

The business requirements are really what matters most to the business: it may be that a batch job can take a bit longer to run, if it has no immediate business impact.

The rightsizing decision-making process needs to also take into account upstream and downstream dependencies: you don’t want to make a rightsizing decision that leads to unintended consequences in other parts of the eco-system.

In addition to looking at this you need to assess how risky the service is to be right sized. This is less of an issue in smaller less complex environments, but in larger more complex eco-systems with a large number of dependencies this can be a major drag on rightsizing safely in live. This is where using the scalability and stability pillars from Capacitas’s 7 Pillars of Performance really helps to make a better-informed decision on the risk associated with implementing rightsizing.

Metrics for making these decisions, are typically spread over multiple toolsets, so you need to gather all this data into a single point to make informed decisions. Unfortunately, APM tools tend to be weaker in terms of depth and coverage of cloud infrastructure metrics. In order to get around these limitations you may need to deploy additional tooling especially where the decision process is not clear cut.

Architectural Optimisations

In order to select the appropriate AWS supply architecture to support your demand, you first need to understand your demand. To do this, we use a technique called Workload Modelling. Workload Modelling is the process of analysing the demand on an IT system and characterising its intensity, size and synchronicity.

In practical terms, Workload Modelling involves characterisation of the workload over multiple dimensions. This is similar to what is carried out when performing rightsizing:

Persistent vs Transient
Synchronous vs Asynchronous
Transaction size
Read vs Write workload
Processor service time
Disk service demand
Network service demand
Disk storage demand
Memory service demand

These are all key considerations before you make your cloud technology choices, this is mentioned in the AWS Well-Architected Framework as a pre-requisite to making the right design choices.

Software Efficiency Improvements

Improving software efficiency is easier now with the wide availability of APM tools such as New Relic and AppDynamics. These tools will allow you to determine the software efficiency, at code level. This analysis should be done in conjunction with development teams who can provide the context of what functions the software is fulfilling.

It’s worth remembering, where you are looking at improving efficiency, not to confuse time and resource consumption in these tools. This is a mistake that non-specialists make which leads to wasted time in trying to fix the wrong thing.

Changes to Increase Application Elasticity

There may be limitations on how elastic parts of your application are, e.g. caches and databases. However, reducing application resource footprint in general will enable it to take advantage of elasticity of the cloud.

If you are not confident in using auto-scaling or have very low thresholds for autoscaling then these are all signs that the application has some constraint which prevents it from being elastic.

Prove Optimisation in Test

Where optimisations carry appreciable performance risk, we should first check performance in a test environment before pushing to production.

We use the 7 Pillars of Performance to design and target performance tests in the right area. For example, downsizing EC2 capacity may present a risk to the throughput and response time of an application when under peak load.

Conversely changing the memory footprint of a database service may present a stability risk.

To scale this capability and avoid false-positive test results, it is critical to use automated performance test analysis. This will speed the route to optimisation and minimise the risk of service-impacting incidents when the optimisations go into live.

Implement Optimisation in Live

This is done in close partnership with the DevOps teams responsible for the system. The DevOps engineer will implement the rightsizing typically in conjunction with a capacity and performance expert. The combination of a DevOps engineer who is familiar with the system and a capacity and performance expert who knows what good looks like over a wider range of metrics than the DevOps engineer will typically look at will enable the optimisations to be delivered into live safely.

The important thing to remember, is that these optimisations can be backed out almost immediately in a cloud environment. Any early warning signs can allow the DevOps and capacity engineers to fine tune the level of optimisation they want to implement in live.

Validate Performance in Live

When the optimisation goes live, it is important to ensure the change has been delivered successfully over the wider demand cycle. Success criteria will include:

Has the change delivered the expected cost-optimisation?
Has the change resulted in the expected performance
behaviour – either modelled or from performance test results?

In is important to note that the last point relates to both the system itself and the upstream and downstream systems it is integrated with.

A key tenet of performance engineering is that capacity changes to a system can have adverse performance impact on coupled systems.

Production Validation is the process of measuring and reporting against these success criteria. Production Validation will take as an input multiple data sources, including:

Capacity monitoring (AWS CloudWatch, etc)
Cost monitoring (Cloudability, etc)
Application Performance Management Tools (New Relic,
AppDynamics, etc.)
Application integration design
NFRs/SLAS

Remove Technical Constraints

What do we do if the change is unsuccessful and adversely impacts performance of the system? Most likely the excess capacity provisioned is masking a fundamental bottleneck in the application code. We term these generically as technical constraints.

In this phase we define the technical constraints through a problem definition statement. Targeted performance testing may be required to define the problem.

Next we define a risk mitigation plan to fix the constraints and plan what investment is required to redesign the code. As we know what the cost-reduction opportunity is for each change, we can build a business case for implementing the mitigation plan.

Next Steps

If you find this whitepaper relevant and interesting, you might also find these resources helpful:

Webinar
How MetaPack Gained Control of their Cloud Costs
Using Auto Scaling to Control AWS Costs

Infographic
The Seven Pillars of Performance

Capacitas Blog
http://www.capacitas.co.uk/blog

Dr. Manzoor Mohammed

Director

About the Author

Dr. Manzoor Mohammed has worked in the area of capacity and performance management for over 20 years. He started his career as a performance engineer at BT Research Laboratories. He co-founded Capacitas, a consulting company which reduces cost and risk in business-critical IT systems through capacity and performance management.

He has worked on numerous large complex projects for customers such as BT Global Services, HP, Skype/Microsoft, easyJet, Nokia etc.

Many of these engagements have resulted in $multi-million savings in datacentre and cloud platform costs, as well as better performing and more stable systems.

Dr. Mohammed leads the R&D function at Capacitas, having developed a ‘shapes-based’ methodology for the automated analysis of performance and capacity issues, which forms the basis of a suite of data analytics tooling.

A Guide to Cost
Optimisation in AWS

Introduction

Key Takeaways