I shared a short talk on what I learned from implementing LLM architectures from scratch in Python and PyTorch.

The practical part is the workflow. When a new open-weight model comes out, I usually start from a compact reference implementation, trace the architecture changes, and compare those details against model cards, config files, and released code. This is often the fastest way to separate naming differences from actual design changes.

The talk is here: What I Learned From Implementing LLM Architectures From Scratch.

For related reading, see the recent LLM architecture developments article and the LLM Architecture Gallery.

YouTube thumbnail for a talk on implementing LLM architectures from scratch

Thumbnail for the YouTube talk on implementing LLM architectures from scratch.

Source: lightly edited website version of my Substack note.