<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=1005900&amp;fmt=gif">

Insights

Thought of the Week: Top 10 Tips on Getting Cloud Observability Right

In May of 2023 both current and former software engineers from Coinbase confirmed the company spent $65M with Datadog in 2021.


About Datadog

Datadog, Inc provides monitoring and analytics platform for developers, information technology operations teams, and business users in the cloud in North America and internationally. The company's SaaS platform integrates and automates infrastructure monitoring, application performance monitoring, log management, and security monitoring to provide real-time observability of its customers technology stack.

About Coinbase

Coinbase is building the cryptoeconomy – a more fair, accessible, efficient, and transparent financial system enabled by crypto. The company started in 2012 with the radical idea that anyone, anywhere, should be able to easily and securely send and receive Bitcoin. Today, Coinbase offers a trusted and easy-to-use platform for accessing the broader cryptoeconomy.


 

Now, to be fair I don't know the detail and I hope the company has got a good ROI from the tool but here are my top 10 tips on getting observability right to keep costs under control.

 

  1. After buying a tool, create an ecosystem of people and integrations to get observability that provides value.
  2. Make sure the tool fits your technology stack (ok, this may seem obvious but be careful as the tool needs to work with legacy tech as well as the new and shiny tech).
  3. Spend time configuring the tools to make it readable and relevant to the users i.e., Rename IIS application pool AGKP_04520756 to UKAGKPPortal.
  4. Remove noise from the tool e.g. If volume /dev/u001/fred  is always full then configure the tool not to show it in red!
  5. Try to avoid underlying infrastructure alerting, it doesn't matter what your hardware is doing, it is your users experience that matter. Set alerts at the UX level.
  6. Configure alerts that align with the business. For example, set alerts for critical user transactions such as time to generate a quote or complete payment rather than a generic alert across all user transactions.
  7. Decide who are the consumers of the tool and work with them to make sure they are trained; the tool should be configured for their needs. I am not talking about just turning them into a dashboard. That would be teaching them how to fish!
  8. These tools do love data sources. The more you give them the more likely you can correlate the source of problems i.e., poor performance could be due to waiting for a VM to be scheduled on the hypervisor. You may never know this unless you are monitoring the hypervisor. Of course, the more you monitor, the more you pay!
  9. Don't be afraid to have a dedicated monitoring team but ensure they are skilled in using the tool and not just configuring the tool. i.e. When production goes down, they are in the thick of it trying to resolve the issue.
  10. Keep an eye on the costs!

 

Many businesses turn to many different tools expecting an easy solution to their challenges. And if a company has extra money to spend, they might not worry about the costs… at first! However, costs start piling up and before you realise it, the damage has already been done.

This is something we see time and time again. However, our team of consultants work to ensure teams get the most value out of their tools while controlling costs, spending only on what they need.

At Capacitas, we have an unparalleled understanding of cloud architecture, infrastructure, and applications. Reach out to speak to an expert via contact@capacitas.co.uk. You can also just click below.

Speak to one of our observability experts


About the Author

Andrew Lee

Andrew Lee is a Consultant specialising in Performance Engineering strategy and management, working with high-profile SaaS and Private Equity clients. Andrew ensures projects are delivered on time and within budget and meet their performance requirements.

Also worth having a look at some of our recent case studies where we have saved our clients Millions of pounds in cloud spend.

Cegid and Capacitas case study   New call-to-action