Traditional observability was built for deterministic systems. AI systems are probabilistic -- same input, different output. Here is how I built monitoring that actually works for LLM-powered production systems.
Hello, World: Why I Stopped Building Software and Started Building Systems
I spent eight years building software. APIs, data pipelines,
event-driven platforms at Eventbrite-scale. Input A produced
Output B. If it didn't, you wrote a test, found the bug, and
fixed it. Deterministic. Predictable. Comfortable.
Then I shipped my first LLM-powered feature to production. The
same prompt produced different outputs on consecutive runs.
My unit tests were meaningless. My debugging instincts were
wrong. And the monthly invoice was four times what I'd modeled.
That moment broke my mental model of software engineering. And
it was the best thing that happened to my career.
The Shift: From Deterministic to Probabilistic
Traditional software engineering gives you a contract. Call
this function with these arguments, get this result. Every
time. AI engineering breaks that contract. You call the same
function with the same arguments and get a different result.
Sometimes better. Sometimes hallucinated. Sometimes expensive.
This is not a bug. It is a fundamental property of the medium.
And it requires a fundamentally different engineering
discipline:
- Testing becomes evaluation. You cannot assert exact
outputs. You measure faithfulness, relevance, and
completeness across distributions. Evals are the unit
tests of AI.
- Error handling becomes graceful degradation. Your LLM
provider will go down. Your model will hallucinate. Your
costs will spike. The question is whether your system
absorbs those failures or passes them to your users.
- Performance tuning becomes cost engineering. In
traditional software, you optimize for latency. In AI, you
also optimize for dollars-per-request. A feature that works
but costs $0.15 per query on $0.08 margins is not a feature.
It is a liability.
What I Write About Here
This blog is the playbook I wish existed when I made the
shift. Not theory. Not hype. Engineering patterns forged in
production:
- Evals as Unit Tests -- How to build evaluation
harnesses that make AI systems reliable enough to trust.
Real code, real pass/fail gates, wired into CI/CD.
- Unit Economics of AI -- RAG systems that bleed money,
vendor lock-in that kills margins, and the architecture
patterns that fix both. Specific numbers, specific savings.
- Human-Centric Interfaces -- Voice agents that respond
in under a second. Chat interfaces that guide users instead
of leaving them to "prompt engineer." The UX patterns that
drive adoption.
- Systems Thinking -- Why your LLM is not your system.
It is one component. And the infrastructure around it
determines whether your product survives production.
Every post comes from something I built, broke, or fixed. If
it has not survived contact with real users and real invoices,
it does not belong here.
Talk to my AI to see these principles in action, or explore my work for the full portfolio.