Estimated read time: 5 Minutes
Author: Manzoor Mohammed
Reference available on request
You've given teams accountability and platform ownership, enabling them to build great cloud-native applications. You have a standardised toolchain and delivery process allowing teams to focus on innovation, alongside tooling to improve visibility. But cloud hosting costs are high. What is missing?
We've worked with the CTO of a leading San Francisco based tech firm with more than 3 million subscribers and over 1 million international customers and $1.5+ Billion revenue growing rapidly.
The challenge: Reduce cloud cost and increase productivity in 2 years
The board and their investors had set the senior leadership team a challenge to reduce their cloud costs by 50 percent over two years, while simultaneously increasing productivity.
It was clear that meeting this ambitious target while simultaneously accelerating product development, required new thinking beyond the standard FinOps housekeeping techniques. There were big changes required in the organisation and technologies. The previous data centre environment had hidden inefficiencies that were deeply embedded in the organisation culture as well as the technology stacks.
The approach: Changing the culture and ownership of technology
The CTO started by making 2 great changes to the culture and ownership of technology:
- Provide visibility and ownership to all the engineering teams on their costs (and performance). Once teams owned and could see it (along with everyone else) there should be no place to hide. The teams became accountable for their costs and performance.
- Bring a cultural change on using shared services. The engineering teams were used to working independently. There were no shared platforms to enable delivery. To build more efficiently the leadership put in place common approaches to monitoring, logging, governance, ci/cd pipelines. This meant that teams could focus on product innovation rather than spending their time supporting toolchains.
However costs kept increasing. The CTO realised that what was needed was a change in thinking about performance. These types of savings were not going to be driven by the existing way of working. This is where we were brought in to support their engineering teams to provide their unique view on how great performance reduces cost.
This contrarian view was ideal to get engineering teams to reduce the costs. Teams either add capacity to keep the show on the road or when they don’t need it (i.e. poor requirements or misunderstanding). Great end to end performance means less capacity which leads to reduced costs and ultimately turn teams into highly performing teams.
We collaborated with the engineering teams to help them think with a capacity and performance focussed mindset. The best way is starting from the end to end performance rather than the cost.
The underlying principle is great end to end performance = less capacity = less cost = highly performing teams.
Their logging platform was a great concept driven by the cultural shifts to visibility and ownership. However this platform itself was beset with the same cost issues as the rest of the architecture.
The platform provided a shared platform for the 70+ teams to do diagnostics on their systems and reduce downtime. It was designed and built in collaboration with AWS architects using cloud native architectures. This included using AWS services such as lambda, kinesis, elasticsearch etc.
The architecture was subject to a WAF review and signed off by Amazon. According to AWS, at the time it was one of the most innovative uses of their services they had seen amongst any of their customers. The engineers were highly skilled and capable.
However, once the platform was live and started taking traffic it experienced frequent stability incidents. At the same time the costs of the platform were far higher than anyone expected. The monthly bills were in excess of hundreds of thousands of dollars. The management and engineering teams knew they needed to do something different.
We brought our new way of thinking about performance to the team. We brought together new ideas of application performance and combined them with cost along with the architecture to make recommendations to reduce the costs by 70%+.
This collaboration approach was replicated across the other 70+ teams leading to the 50 percent reduction in cloud costs over the two years while at the same time growing their traffic volumes by 30 percent.
The hard part: doing something different
Their CTO talked about how implementing this radical change in approach is not easy. At the same time as he was trying to change the shape of the curve he was still tasked with improving speed of innovation.
Capacitas supported the CTO and his team in changing the shape of the curve from the ground up. By engaging at different levels of the organisation this new view on performance was embedded.
The results were that the cloud spend costs were reduced by 50%.
The high cloud costs were symptoms of underlying inefficiencies and poor working practices. Not only was it costing money but it was also slowing down the teams productivity as they had to keep the show road on bloated cloud infrastructures. Leveraging both a contrarian view of performance and cost helped build a scalable cost effective system for the future and a team culture aligned on what great performance and costs look like.
The worlds largest technology investor and board recognised the CTO and his team as world class.