Essays · Updated monthly
Writing
On building production ML — the four layers before the LLM, the routing decisions upstream of the model, and field reports from a three-agent IDP pipeline.
8 min
Four layers before the LLM: the gatekeeper pattern
How a stack of cheap classifiers cuts a document-IDP bill by an order of magnitude without ever waking the model.
6 min
The cheapest token is the one you don't send
Routing, retrieval, and a small set of decisions about which questions a model should never see.
7 min
Production ML is mostly not ML
A field report from a three-agent IDP pipeline: what breaks first, what's worth automating, and where humans still earn their seat.