Latest Posts
Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs

My Workflow for Understanding LLM Architectures

My Workflow for Understanding LLM Architectures

A learning-oriented workflow for understanding new open-weight model releases

Components of A Coding Agent

Components of A Coding Agent

How coding agents use tools, memory, and repo context to make LLMs work better in practice

A Visual Guide to Attention Variants in Modern LLMs

A Visual Guide to Attention Variants in Modern LLMs

From MHA and GQA to MLA, sparse attention, and hybrid architectures

Contact sheet of LLM architecture diagrams from the gallery

New LLM Architecture Gallery

I put together a new LLM Architecture Gallery that collects the architecture figures from my recent comparison articles in one place, together with compact fact sheets and links.