Quick Paper and Model Notes

Jul 26, 2026 Quick Model Note

A Few Notable Open-Weight Models This Week

Short note on the architectures of six new open-weight models, including Nanbeige 4.2, Laguna S 2.1, Motif-3-Beta, Solar Open 2, Antares 1B, and BTL-3.

Substack Note

Jul 25, 2026 Quick Book Note

Correction for Listing 6.5 in Build a Reasoning Model From Scratch

Short correction note for the random seed in Listing 6.5 on page 198 of Build a Reasoning Model From Scratch.

Substack Note

Jul 16, 2026 Quick Model Note

Inkling: A New Open-Weight 975B MoE with a Few Surprises

Short note on Thinking Machines Lab's 975B Inkling model, including benchmarks, sparse MoE design, short convolutions, RMSNorm, and position bias.

Jul 12, 2026 Quick Blog Note

200,000 Subscribers

Short note celebrating Ahead of AI reaching 200,000 subscribers.

Blog

Jul 9, 2026 Quick Model Note

GPT 5.6 Has 72 Possible Configurations. What's A Good Default?

Short note on how GPT 5.6 model and effort choices map onto training-time and inference-time scaling, producing 72 configurations.

Jun 30, 2026 Quick Book Note

Build a Reasoning Model From Scratch Is Out

Short note announcing the release of Build a Reasoning Model From Scratch and linking the publisher and Amazon pages.

Substack Note

Jun 29, 2026 Quick Article Note

Using Local Coding Agents

Short note linking a new article on setting up local coding agents with open-weight models.

Substack Note

Jun 26, 2026 Quick Benchmark Note

Local Open-Weight LLMs in Coding Harnesses

Short note on trying local open-weight LLMs across Qwen-Code, Codex, and Claude Code harnesses.

Substack Note

Jun 18, 2026 Quick Model Note

GLM-5.2 and IndexShare for Long-Context Sparse Attention

Short note on GLM-5.2, an open-weight GLM update that keeps the GLM-5 sparse MoE backbone and adds IndexShare for cheaper 1M-token DSA inference.

Substack Note

Jun 17, 2026 Quick Model Note

VibeThinker-3B and the Strength of Post-Training

Short note on VibeThinker-3B, a 3B model based on Qwen2.5-Coder-3B whose reported coding and reasoning results point to strong post-training.

Substack Note

Jun 12, 2026 Quick Model Note

North Mini Code and Agentic Coding Benchmarks

Short note on North Mini Code, Cohere's 30B total and 3B active open-weight MoE model for agentic coding tasks.

Substack Note

Jun 4, 2026 Quick Architecture Note

Nemotron 3 Ultra and Latent MoE Scaling

Short note on Nemotron 3 Ultra, NVIDIA's 550B total and 55B active hybrid Mamba-Transformer Latent MoE model.

Substack Note

May 27, 2026 Quick Paper Note

MiniMax M2 and Production-Oriented Model Design

Short note on the MiniMax-M2 technical report, including full attention, fine-grained MoE, agent pipelines, speed rewards, and self-evolution.

Substack Note

May 23, 2026 Quick Architecture Note

DeepSeek Sparse Attention From Scratch

Short note on a DeepSeek Sparse Attention from-scratch implementation added to the LLMs-from-scratch repository.

Substack Note

May 14, 2026 Quick Architecture Note

Implementing LLM Architectures From Scratch

Short note linking a talk on implementing LLM architectures from scratch and comparing new open-weight model implementations against references.

Substack Note

Apr 2, 2026 Quick Benchmark Note

Gemma 4 Architecture and Benchmark Notes

Short note on Gemma 4 31B, including its local-global attention recipe, benchmark jump over Gemma 3, and Apache 2.0 release.

Substack Note

Mar 26, 2026 Quick Architecture Note

LLM Architecture Gallery Diff Tool

Short note on the LLM Architecture Gallery diff tool for comparing two model architecture stacks side by side.

Substack Note

Mar 12, 2026 Quick Model Note

Nemotron 3 Super Throughput Notes

Short note on NVIDIA Nemotron 3 Super 120B-A12B, a hybrid Mamba-Transformer MoE model with latent experts and shared-weight MTP.

Substack Note