Latest Articles
LLM Research Papers: The 2026 List (January to May)
LLM Research Papers: The 2026 List (January to May)

A curated roundup of notable LLM research papers that came out this year

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention
Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs

My Workflow for Understanding LLM Architectures
My Workflow for Understanding LLM Architectures

A learning-oriented workflow for understanding new open-weight model releases

Components of A Coding Agent
Components of A Coding Agent

How coding agents use tools, memory, and repo context to make LLMs work better in practice

Quick Notes
VibeThinker-3B and the Strength of Post-Training

Short note on VibeThinker-3B, a 3B model based on Qwen2.5-Coder-3B whose reported coding and reasoning results point to strong post-train...

North Mini Code and Agentic Coding Benchmarks

Short note on North Mini Code, Cohere's 30B total and 3B active open-weight MoE model for agentic coding tasks.

Nemotron 3 Ultra and Latent MoE Scaling

Short note on Nemotron 3 Ultra, NVIDIA's 550B total and 55B active hybrid Mamba-Transformer Latent MoE model.

MiniMax M2 and Production-Oriented Model Design

Short note on the MiniMax-M2 technical report, including full attention, fine-grained MoE, agent pipelines, speed rewards, and self-evolu...

DeepSeek Sparse Attention From Scratch

Short note on a DeepSeek Sparse Attention from-scratch implementation added to the LLMs-from-scratch repository.