Scrum Master Toolbox Podcast: Agile storytelling from the trenches: BONUS: Establishing SRE / Site reliability engineering Foundations: A Step-by-Step Guide

The many aspects of Site Reliability Engineering

In this segment, we discuss the many different aspects that we must take into account when operating live cloud-based services. Vlad describes aspects like the change leadership (who to involve and how to involve them), to the technical decisions and processes (on call service, etc.)

In this respect, it is critical to develop and deploy the appropriate methodology, including the HR related aspects (e.g. on call service) that enable the use of SRE in your organisation. Implementing SRE is one example of the end-to-end processes that Scrum Masters must be familiar with, and in some cases, help deploy in their organizations.

Product Manager (Product Owner) - Scrum team collaboration is key, also when implementing Site Reliability Engineering

As the team started implementing SRE, it was quickly clear that the Product Manager (Product Owner) role would have a critical impact on the success of the teams. This led Vlad to develop an approach to align the organization, from development to operations, and established SRE as an organizational initiative.

In this segment, we also discuss what happens when operations are kept apart from the product development cycle. A problem that SRE tries to eliminate. The topic of organizational alignment is also extensively discussed in the episode with János Csorvási and Jeff Campbell.

When to start with Site Reliability Engineering?

Vlad shares with us that many organizations try to adopt SRE after a major incident. At that time, everyone is in a hurry and the results will be heavily influenced by that urgency. It’s important for organizations operating live systems to be able to prepare for the transition. In the book, Vlad discusses how to prepare your organization for the adoption of SRE before any major problem happens. In this segment, we explore some of the key change leadership topics that determine the level of readiness for adopting SRE.

If you want to explore more, Vlad participated in an interview with InfoQ, and you can check out his interview here:

You can purchase the book on Amazon: Establishing SRE Foundations: A Step-by-Step Guide to Introducing Site Reliability Engineering in Software Delivery Organizations by Vlad Ukis

About Vlad Ukis

Vlad is a leader of R&D and reliability lead at Siemens Healthineers. In this capacity, he drives Continuous Delivery, SRE, and DevRel transformation, helping this large distributed development organization evolve architecture, deployment, testing, operations, and culture to implement these new processes at scale.

You can link with Vlad Ukis on LinkedIn.

Download this Episode

0 Comments

Adding comments is not available at this time.