Implementing LLM Architectures From Scratch
I shared a short talk on what I learned from implementing LLM architectures from scratch in Python and PyTorch.
The practical part is the workflow. When a new open-weight model comes out, I usually start from a compact reference implementation, trace the architecture changes, and compare those details against model cards, config files, and released code. This is often the fastest way to separate naming differences from actual design changes.
The talk is here: What I Learned From Implementing LLM Architectures From Scratch.
For related reading, see the recent LLM architecture developments article and the LLM Architecture Gallery.
Source: lightly edited website version of my Substack note.
Read Next
VibeThinker-3B and the Strength of Post-Training
Short note on VibeThinker-3B, a 3B model based on Qwen2.5-Coder-3B whose reported coding and reasoning results point to strong post-training.
North Mini Code and Agentic Coding Benchmarks
Short note on North Mini Code, Cohere's 30B total and 3B active open-weight MoE model for agentic coding tasks.
Nemotron 3 Ultra and Latent MoE Scaling
Short note on Nemotron 3 Ultra, NVIDIA's 550B total and 55B active hybrid Mamba-Transformer Latent MoE model.
