Running 200 TPS on Lambda for an hour and learning humility
We load-tested our middleware at 200 transactions per second for a full hour. Lambda held. But the SQS queue depth graph looked like a heartbeat monitor, and I spent three hours convinced we had a bug. It was just auto-scaling warming up. Observability without context is noise.
Converting SOAP to JSON at the government’s pace
Legacy government systems speak SOAP. Modern microservices speak JSON. Building the translator layer in between — with WAF, private API Gateway, and Route 53 inbound resolvers — was the most unglamorous, most impactful work I have ever done. Nobody sees the middleware until it breaks.
What DLQs taught me about trust
The first time a dead-letter queue caught a batch of 3,000 failed messages in production, I felt sick. Then I realised: this is the system working. The DLQ is proof that we planned for failure. Reliability is not the absence of errors — it is knowing where they go.
Playwright + Python: bridging two worlds
Writing E2E tests for a TypeScript frontend using Python felt wrong at first. But the QA team owned Python, and the goal was automation, not purity. Two weeks in, we had 60% regression coverage on critical flows. Pragmatism beats elegance when deadlines are real.
The Teams webhook that became the most important endpoint
We built a Lambda that posts CloudWatch alarm summaries to a Teams channel. Took half a day. It became the first thing the on-call engineer checks in the morning. Sometimes the highest-ROI engineering is a webhook and a well-formatted message.
At what scale does serverless stop being the right answer?
I keep telling teams that serverless is good for bursty, unpredictable workloads. But where is the real inflection point? At 10,000 RPS? At sustained constant load? I have heard very different answers from very credible people and I still do not have mine.
How do you actually measure ‘code quality’?
Code coverage? Cyclomatic complexity? Pull request review time? I have used all of these. None of them capture the thing I actually care about: can a new engineer understand and safely modify this code in six months? I am still looking for a metric that gets close.
Is event-driven architecture making distributed systems harder to reason about?
EventBridge gives you beautiful decoupling. But ‘who published this event and why did it trigger that Lambda?’ becomes genuinely hard to answer in a post-mortem. I wonder if we are solving coupling by creating complexity of a different kind.
What does ‘senior’ actually mean in engineering?
I have seen senior engineers who write brilliant code but cannot bring a junior along. And I have seen others who write mediocre code but ship teams that consistently deliver. Which one is more valuable? I think I know the answer, but it makes me uncomfortable.
Will AI write the boilerplate, or will it write the architecture?
Right now AI is great at plumbing — CRUD endpoints, test scaffolding, repetitive config. But the hard decisions — what to build, how services should talk, what to NOT build — still feel very human. I wonder how long that distinction holds.