<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=1005900&amp;fmt=gif">

Insights

Thought of the Week: Key Drivers for an SRE Practice

01st September 2023 by 
Frank Warren Cloud SRE

In conversations with customers and network peers, many companies are considering setting up a dedicated SRE team or possibly looking to realign existing responsibilities. According to a report from Catchpoint, 50% of organisations have dedicated SRE teams or roles, and the number of vacancies for Service Reliability engineers has increased dramatically.

This supports the belief that system reliability, performance, and availability continue to be at the top of the key drivers for establishing an improved foundation of SRE practices.

Key drivers for an SRE practice

  1. The scale and complexity of IT Systems are key determinants. Increasing scale and complexity undoubtedly expose much more risk.
  2. Operational risks are not proactively mitigated through development and tend to be reactively resolved.
  3. The impact of operational failure on the business is substantial in terms of revenue loss and reputation.
  4. The frequency and severity of production incidents are high. Development teams are spending too much time firefighting. Incident management is not fixing issues properly.
  5. Service-Level Objectives (SLOs) for high-priority systems either do not exist or are not measured. Actionable insights are not being generated and operational issues are not exposed proactively. Management of SLOs is not happening.
  6. Production monitoring and alerting are not set up properly and this leads to poor insight on performance, availability and reliability risk. Reporting is very weak. There is little or no observability in test environments.
  7. Development teams miss chances to improve time to market and are not taking advantage of transformative activities such as automation frameworks, testing frameworks, deployment, and Infrastructure as Code. Releases are often overrun and the release cycle is slow.
  8. Non-functional testing (performance/scalability/efficiency, resilience/recovery, security) is executed poorly if at all, and is not underpinned by testing frameworks.
  9. Cross-functional collaboration between Service Management, Operations, and Development teams is poor and the benefits of close cooperation are not realised.

If any of these factors describe operational challenges you are experiencing then it might be time to examine your organisational capability and implement a remediation plan to plug key gaps.

 

Speak to one of our DevSecOps experts

 

About the Author

Frank Warren

Frank is a Principal Consultant specialising in capacity planning, performance engineering and cloud cost optimisation. Frank leads numerous high profile ecommerce clients, helping them achieve their business peaks while savings on cloud costs and improving performance.

If you would like to have a chat about optimising your cloud bill, feel free to reach out for a no commitment chat. You can contact us via the website at https://www.capacitas.co.uk/book-a-diagnostic-session or reach out via email at contact@capacitas.co.uk

Also worth having a look at some of our recent case studies where we have saved our clients Millions of pounds in cloud spend.

Cegid and Capacitas case study   New call-to-action