Consistency
In Naming Conventions I’ve mentioned the downside of failing to have certain conventions, which ended up causing an outage.
Writing the Post Incident Review of said outage, the good point was consistency.
By using the same tools across our group, the team responding to an incident had fewer unknowns to be dabbling with. This made troubleshooting much easier, and allowed us to focus on the things we didn’t know, then quickly turn around from full (service) outage into back into business as quickly as the automation job could run.
Delete more
Watching a video about an useful Executive Functioning hack, I’ve decided to take more control and started clearing the inbox of all those newsletters that I had to read a link or other.
Poor people at my work’s Slack as I just bombarded them with links I wanted to share. :D
Two in particular made me want to write a few words.
Lessons Learnt from Twenty Years of Site Reliability Engineering
The article doesn’t have a date other than a mention the time before GMail existed, so the SRE function at Google is at least pre-April 2004.
Practical Doomsday
Finished reading Practical Doomsday by Michal Zalewski (lcamtuf).
Having previously enjoyed his Disaster planning for regular folks, this was a good reading.
Michal does a great job in setting the scene that everyone should have a least a minimum plan on what to do. Disasters can happen at any time, from a multiple day power outage, to record-shattering floods.
Some of the suggestions around planning reminded me of Christopher Burgess’s Red Folder, the idea being you have a red folder at home with all relevant data about yourself and members of close family, and practical information about the house, car, insurance, etc, should the need arise.
Naming conventions
Today’s outage was brought you by having the same API token replicated in several places. Of course someone forgot about that place that runs only once a week, on a Tuesday, on a job that doesn’t alert if it fails.
Naming things is hard, always been, always will be.
But, the name doesn’t matter.
Next time, ask yourself: maybe we already have another copy of this token somewhere else?
Then you won’t need to be visiting several places to update all copies. Because, truth be told, you won’t.
Burger Joint
As a Platform team at work, we split our offering in three tiers.
- alpha
- beta
- prod
Yes, yes, naming things is hard, why do you ask?
Today we were discussing how broken can alpha be.
I think a burger joint analogy works here.
If prod is your main street venue, the one where you take your wife and kids. You want the food to taste good, to be consistent every time you go, and probably good value for money.
Fail slow to recover fast
Back to work after a relaxing long weekend.
Today, intercontinental active/active deployments for API traffic, naturally on top of HTTP.
The keywords there already hint that you have multiple load balancers, at least a pair. What happens after each load balancer, say, traffic entering Americas and being served by Europe back-end, is outside the scope.
You want to control how quickly the API clients stop hitting the Americas load balancer if that Region went down.
AI first time
Doing the work at core of what I planned to do this week took half day, lots of fun. Doing the scaffolding around the code so that it could be deployed and used, four times that. Not so fun. Not hard, just … a long of moving parts?
Anyway, it works as intended and I’m pretty happy.
Almost 6pm I realised I needed something and, following a recent conversation with a colleague,
gave the day job’s AI helper a try. Asked it to help me with a for loop on Terraform.
It did.
Jenkins, love and hate
A moment of love/hate with Jenkins today.
Hate first, as was reminded me how brittle it is as world around it moves forward. Reasonably sure there’s a core functionality that was moved to a plugin and I’ll have to figure out what that is.
But lots of love, this has been running (patched and updated…) for over four years now, and it still does its job.
From time to time I look back and think we could’ve done it so differently. I’ve now becoming more accustomed to realising that this view, the hindsight, makes us forget that many of the nice things we have now did not exist four years ago.
The past
Interesting day to try and put a positive spin on things.
Looking at legacy and reminding oneself that things that have been running “like this” for a long time are a good thing. Means they are working as intended.
Is there always a way to make it better? Possibly. But nothing beats “done, for what we needed right now”.
RYOMS
Felt good to sponsor Run Your Own Mail Server kickstarter.
I haven’t read mwl’s work before, other than the first chapters of Absolute FreeBSD as available for Prime readers.
I liked his style and have followed the writing of the book as he posted on Mastodon.
As someone that used to run mail servers professionally, it hasn’t (yet?) excited me enough to try and running my own. Maybe the book could change that? Hope not, mail servers are endless amount of pain. :D