Reflections on triaging hundreds of production bugs over two years — what patterns emerge, how to write defensive code, and building E2E test coverage that actually works.