Vol. II · No. 156
Established 2025

smallweb

Friday, June 5, 2026
160 writers in the library
Tech · 1 shelves
Tech

Surfing Complexity.

Lorin Hochstein on complex systems, resilience engineering, and human adaptability in software.

Recent essays

26 of 26

Form may follow function, but use doesn’t follow design

At this point, you have no doubt heard about GitHub’s availability woes over the past several months. In April, Mitchell Hashimoto (of Hashicorp fame) wrote a post about how he is moving his Ghostty project off of GitHub: …for the past month I’ve kept a journa…

The coming coordination calamity

We cut middle managers across the organization because AI allows us to have more direct reports per manager while still measuring and mentoring our teams effectively. – Matthew Prince, How I Choose Which Cloudflare Employees to Replace With AI My PhD research…

Reliability as a game of improving the odds

I’m a betting man; I just enjoy making bets, even when there are no stakes at all. And when you talk about bets, you end up talking about odds. It turns out that reliability is also about odds, even though we don’t use the language of odds in our domain. Consi…

Flipping the bozo bit on flips the learning off

“Flipping the bozo bit” is an expression from the software world. Think about a time when you reached a point where you simply stopped respecting the opinion of a particular person, most likely a co-worker. From that point on, you disregarded what they said. T…

How incidents can teach us about what’s already working well

Here’s a famous optical illusion, which was developed by the American neuroscientist Edward H. Adelson. Even though square A appears darker than square B, the two are, in fact, the exact same shade of gray. It’s such a powerful illusion that, even knowing the…

My SREcon talk

My SREcon talk is now up.

Life comes at you fast

Now, here, you see, it takes all the running you can do, to keep in the same place. – Lewis Carroll, Through the Looking-Glass, and What Alice Found There LLM coding may be revolutionizing software development productivity, but it doesn’t seem to be generating…

The normal work of creating reliability

Here’s a recent comment on LinkedIn from John Allspaw, on a post by Gandhi Mathi Nathan Kumar about availability. Allspaw’s comment is a succinct description of a safety model proposed by the Danish resilience engineering researcher Erik Hollnagel: Safety-II.…

Thoughts on the Bluesky public incident write-up

Back on April 4, the social media site Bluesky suffered a pretty big outage. I was delighted to discover that one of their engineers, Jim Calabro, published a public writeup about it: April 2026 Outage Post-Mortem. Calabro’s post goes into a lot of technical d…

References from my SREcon talk on stories

This past week at SREcon 2026 Americas, I gave a plenary talk titled The Power of Stories. I referenced several books and papers in that talk, which are linked below. Books From Novice to Expert: Excellence and Power in Clinical Nursing Practice by Patricia Be…

Quick thoughts on GitHub CTO’s post on availability

GitHub’s been taking it on the chin on the availability front lately. Yesterday, their CTO, Vlad Fedorov, wrote a post on their blog about their recent incidents: Addressing GitHub’s recent availability issues. This post shares some additional details about th…

Grow fast and overload things

The general vibes I see online is that the AI companies have not been doing particularly well in the reliability department. Both OpenAI and Anthropic publish reliability statistics on their status pages. Now, I’m not a fan of using the nines as a meaningful i…

Saturation

I wrote a blog post on saturation for the Resilience in Software foundation. Check it out!

Quick takes on Feb 20 Cloudflare outage

Cloudflare just posted a public write-up of an incident that they experienced on Feb. 20, 2026. While it was large enough for them to write it up like this, it looks like the impact is smaller than the previous Cloudflare incidents I’ve written about here. Giv…

Poor Deming never stood a chance

This post is an elaboration of a shorter post I wrote about five years ago. The two management giants of the mid-twentieth century were Peter Drucker and W. Edwards Deming. Ironically, while Drucker hails from Austria-Hungary (like me, Drucker emigrated to the…

Lots of AI SRE, no AI incident management

With the value of AI coding tools now firmly established in the software industry, the next frontier is AI SRE tools. There are a number of AI SRE vendors. In some cases, vendors are adding AI SRE functionality to extend their existing product lineup, a quick…

Nobody knows how the whole system works

One of the surprising (at least to me) consequences of the fall of Twitter is the rise of LinkedIn as a social media site. I saw some interesting posts I wanted to call attention to: First, Simon Wardley on building things without understanding how they work:…

On variability

I was listening to Todd Conklin’s Pre-Accident Investigation Podcast the other day, to the episode titled When Normal Variability Breaks: The ReDonda Story. The name ReDonda in the title refers to ReDonda Vaught, an American registered nurse. In 2017, she was…

Ashby taught us we have to fight fire with fire

There’s an old saying in software engineering, originally attributed to David Wheeler: We can solve any problem by introducing an extra level of indirection. The problem is that indirection adds complexity to a system. Just ask anybody who is learning C and is…

Because coordination is expensive

If you’ve ever worked at a larger organization, stop me if you’ve heard (or asked!) any of these questions: (As an aside, my favorite “multiple solutions” example is workflow management systems. I suspect that every senior-level engineer has contributed code t…

Amdahl, Gustafson, coding agents, and you

In the software operations world, if your service is successful, then eventually the load on it is going to increase to the point where you’ll need to give that services more resources. There are two strategies for increasing resources: scale up and scale out.…

From Rasmussen to Moylan

I hadn’t heard of James Moylan until I read a story about him in the Wall Street Journal after he passed away in December, but it turns out my gaze had fallen on one his designs almost every day of my adult life. Moylan was the designer at Ford who came up wit…

Telling the wrong story

In last Sunday’s New York Times Book Review, there was an essay by Jennifer Szalai titled Hannah Arendt Is Not Your Icon. I was vaguely aware of Arendt as a public intellectual of the mid twentieth century, someone who was both philosopher and journalist. The…

Verizon outage report predictions

Yesterday, Verizon experienced a major outage. The company hasn’t released any details about how the outage happened yet, so there’s no quick takes to be had. And I have no personal experience in the telecom industry, and I’m not a network engineer, so I can’t…

On intuition and anxiety

Over at Aeon, there’s a thoughtful essay written by the American anesthesiologist Ronald Dworkin about how he unexpectedly began suffering from anxiety after returning to work from a long vacation. During surgeries he became plagued with doubt, where he experi…

The dangers of SSL certificates

Yesterday, the Bazel team at Google did not have a very Merry Boxing Day. An SSL certificate expired for https://bcr.bazel.build and https://releases.bazel.build, as shown in this screenshot from the github issue. This expired certificate apparently broke the…