About | The Blog

This site is a place for careful notes: performance investigations, implementation details, readings, experimental results, and unfinished questions from ML systems work.

The recurring themes are practical: how to make models faster, cheaper, more reliable, and easier to reason about when they run on real hardware and real clusters.

Current interests

LLM systems
HPC
inference
CUDA
distributed training
ML infrastructure