Delete more

June 8, 2024

Watching a video about an useful Executive Functioning hack, I’ve decided to take more control and started clearing the inbox of all those newsletters that I had to read a link or other.

Poor people at my work’s Slack as I just bombarded them with links I wanted to share. :D

Two in particular made me want to write a few words.

Lessons Learnt from Twenty Years of Site Reliability Engineering

The article doesn’t have a date other than a mention the time before GMail existed, so the SRE function at Google is at least pre-April 2004.

It’s all very interesting lessons, can’t fault any of those 11 lessons.

What I’m curious is whether SRE will continue to be “a thing” in 2024 and beyond. I’m not alone in my questioning, as 5 SRE Predictions for 2024 has made a much stronger argument than I can do.

The economic headwinds and the rise of Platform Engineering suggest that fewer companies will continue chanting the SRE mantra. And, without kidding ourselves much, it’s very rare to see SRE really taking its full shape.

I strongly believe that the effort in getting better at deploying software, making it easier to test in production, getting those metrics and “observability” (to use the wider term) to developer’s hand will trump most companies’ needs than parachuting an engineer to look at SLI/SLO/SLAs.

Why don’t you status?

First, the annoying bit of making status a verb. This was fairly confusing for me as non-native speaker.

Nonetheless, the reality of large scale operations is well explained in this blog.

It becomes every time harder and harder to get a “up vs down” view of the world.

Something I noticed all the way back when working at ISP, “any outage is the largest outage at the eye of the beholder”, if you excuse my poor attempt at the verse.

Meaning that, even if all other customers of your ISP are online, for that single customer that is offline for any reason, this is the biggest problem.