Large Language Model Articles | Sebastian Raschka, PhD

A running index of my articles and tutorials on large language models — covering architecture, finetuning, tokenization, attention mechanisms, evaluation, and more. Research notes live in the Ahead of AI newsletter.

2026

Jul 28 Kimi K3 Architecture Notes Short architecture note on Kimi K3, including LatentMoE, Kimi Delta Attention, Attention Residuals, NoPE, multimodality, and inference-efficiency choices.
Jul 26 A Few Notable Open-Weight Models This Week Short note on the architectures of six new open-weight models, including Nanbeige 4.2, Laguna S 2.1, Motif-3-Beta, Solar Open 2, Antares 1B, and BTL-3.
Jul 25 Correction for Listing 6.5 in Build a Reasoning Model From Scratch Short correction note for the random seed in Listing 6.5 on page 198 of Build a Reasoning Model From Scratch.
Jul 16 Inkling: A New Open-Weight 975B MoE with a Few Surprises Short note on Thinking Machines Lab's 975B Inkling model, including benchmarks, sparse MoE design, short convolutions, RMSNorm, and position bias.
Jul 9 GPT 5.6 Has 72 Possible Configurations. What's A Good Default? Short note on how GPT 5.6 model and effort choices map onto training-time and inference-time scaling, producing 72 configurations.
Jun 30 Build a Reasoning Model From Scratch Is Out Short note announcing the release of Build a Reasoning Model From Scratch and linking the publisher and Amazon pages.
Jun 26 Local Open-Weight LLMs in Coding Harnesses Short note on trying local open-weight LLMs across Qwen-Code, Codex, and Claude Code harnesses.
May 23 DeepSeek Sparse Attention From Scratch Short note on a DeepSeek Sparse Attention from-scratch implementation added to the LLMs-from-scratch repository.
May 16 Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention How Gemma 4, DeepSeek V4, and other recent open-weight LLMs reduce long-context costs through KV sharing, compressed attention, and new hybrid designs.
May 14 Implementing LLM Architectures From Scratch Short note linking a talk on implementing LLM architectures from scratch and comparing new open-weight model implementations against references.
Mar 26 LLM Architecture Gallery Diff Tool Short note on the LLM Architecture Gallery diff tool for comparing two model architecture stacks side by side.
Mar 14 New LLM Architecture Gallery Visual gallery of LLM architecture variants: attention mechanisms, positional encodings, MoE, and more — with comparison figures and compact reference sheets.
Feb 25 A Dream of Spring for Open-Weight LLMs: 10 Architectures from Jan-Feb 2026 A Round Up And Comparison of 10 Open-Weight LLM Releases in Spring 2026
Jan 24 Categories of Inference-Time Scaling for Improved LLM Reasoning Inference scaling has become one of the most effective ways to improve answer quality and accuracy in deployed LLMs. The idea is straightforward. If we are...

2025

Dec 30 The State Of LLMs 2025: Progress, Problems, and Predictions A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026.
Dec 30 LLM Research Papers: The 2025 List (July to December) A curated list of LLM research papers from July–December 2025, organized by reasoning models, inference-time scaling, architectures, training efficiency...
Dec 3 From DeepSeek V3 to V3.2: Architecture, Sparse Attention, and RL Updates Similar to DeepSeek V3, the team released their new flagship model over a major US holiday weekend. Given DeepSeek V3.2's really good performance (on GPT-5...
Nov 4 Beyond Standard LLMs After I shared my Big LLM Architecture Comparison a few months ago, which focused on the main transformer-based LLMs, I received a lot of questions with...
Oct 5 Understanding the 4 Main Approaches to LLM Evaluation (From Scratch) Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples
Sep 6 Understanding and Implementing Qwen3 From Scratch Previously, I compared the most notable open-weight architectures of 2025 in The Big LLM Architecture Comparison. Then, I zoomed in and discussed the...
Aug 9 From GPT-2 to gpt-oss: Analyzing the Architectural Advances OpenAI just released their new open-weight LLMs this week: gpt-oss-120b and gpt-oss-20b, their first open-weight models since GPT-2 in 2019. And yes, thanks...
Jul 19 The Big LLM Architecture Comparison It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek-V3 and...
Jul 1 LLM Research Papers: The 2025 List (January to June) The latest in LLM research with a hand-curated, topic-organized list of over 200 research papers from 2025.
Jun 17 Understanding and Coding the KV Cache in LLMs from Scratch KV caches are one of the most critical techniques for efficient inference in LLMs in production. KV caches are an important component for compute-efficient...
May 10 Coding LLMs from the Ground Up: A Complete Course Why build an LLM from scratch? It's probably the best and most efficient way to learn how LLMs really work. Plus, many readers have told me they had a lot...
Apr 19 The State of Reinforcement Learning for LLM Reasoning A lot has happened this month, especially with the releases of new flagship models like GPT-4.5 and Llama 4. But you might have noticed that reactions to...
Mar 29 First Look at Reasoning From Scratch: Chapter 1 As you know, I've been writing a lot lately about the latest research on reasoning in LLMs. Before my next research-focused blog post, I wanted to offer...
Mar 8 Inference-Time Compute Scaling Methods to Improve Reasoning Models This article explores recent research advancements in reasoning-optimized LLMs, with a particular focus on inference-time compute scaling that have emerged...
Feb 5 Understanding Reasoning LLMs Overview of four ways to build reasoning-capable LLMs, including inference-time scaling, supervised finetuning, reinforcement learning, and search-based meth...
Jan 23 Noteworthy LLM Research Papers of 2024 This article covers 12 influential AI research papers of 2024, ranging from mixture-of-experts models to new LLM scaling laws for precision.
Jan 17 Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch Implements byte pair encoding (BPE) tokenization from scratch: tokenizer training, GPT-style merge rules, and step-by-step Python examples.

2024

Dec 29 LLM Research Papers: The 2024 List I want to share my running bookmark list of many fascinating (mostly LLM-related) papers I stumbled upon in 2024. It's just a list, but maybe it will come...
Nov 3 Understanding Multimodal LLMs There has been a lot of new research on the multimodal LLM front, including the latest Llama 3.2 vision models, which employ diverse architectural...
Sep 21 Building A GPT-Style LLM Classifier From Scratch This article shows you how to transform pretrained large language models (LLMs) into strong text classifiers. But why focus on classification? First...
Sep 1 Building LLMs from the Ground Up: A 3-hour Coding Workshop Three-hour coding workshop that builds the core pieces of a GPT-style large language model from the ground up for developers who want implementation intuition.
Aug 17 New LLM Pre-training and Post-training Paradigms There are hundreds of LLM papers each month proposing new techniques and approaches. However, one of the best ways to see what actually works well in...
Jul 20 Instruction Pretraining LLMs This article covers a new, cost-effective method for generating data for instruction finetuning LLMs; instruction finetuning from scratch; pretraining LLMs...
Jun 2 LLM Research Insights: Instruction Masking and New LoRA Finetuning Experiments? This article covers three new papers related to instruction finetuning and parameter-efficient finetuning with LoRA in large language models (LLMs). I work...
Jun 2 Developing an LLM: Building, Training, Finetuning This is an overview of the LLM development process. This one-hour talk focuses on the essential three stages of developing an LLM: coding the architecture...
May 12 How Good Are the Latest Open LLMs? And Is DPO Better Than PPO? What a month! We had four major open LLM releases: Mixtral, Meta AI's Llama 3, Microsoft's Phi-3, and Apple's OpenELM. In my new article, I review and...
Apr 20 Using and Finetuning Pretrained Transformers Guide to using and finetuning pretrained transformers, comparing feature extraction, prompt-based use, full finetuning, and parameter-efficient LLM adaptation.
Feb 18 Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch Technical tutorial on LoRA and DoRA for parameter-efficient finetuning, with from-scratch PyTorch code and intuition for weight-decomposed low-rank adaptation.

2023

Sep 15 Optimizing LLMs From a Dataset Perspective Practical guide to improving LLM finetuning with better instruction datasets, covering data curation, prompt-output pairs, synthetic data, and experiment ideas.
Jun 14 Finetuning Falcon LLMs More Efficiently With LoRA and Adapters Finetuning allows us to adapt pretrained LLMs in a cost-efficient manner. But which method should we use? This article compares different...
May 11 Accelerating Large Language Models with Mixed-Precision Techniques Training and using large language models (LLMs) is expensive due to their large compute requirements and memory footprints. This article will explore how...
Apr 26 Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA) Pretrained large language models are often referred to as foundation models for a good reason: they perform well on various tasks, and we can use them as a...
Apr 12 Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to LLaMA-Adapters In the rapidly evolving field of artificial intelligence, utilizing large language models in an efficient and effective manner has become increasingly...
Mar 28 Finetuning Large Language Models On A Single GPU Using Gradient Accumulation Previously, I shared an article using multi-GPU training strategies to speed up the finetuning of large language models. Several of these strategies include...
Feb 9 Understanding and Coding the Self-Attention Mechanism of Large Language Models From Scratch Step-by-step tutorial explaining scaled dot-product self-attention for large language models, with Python code that builds the mechanism from scratch.
Feb 9 Understanding and Coding Self-Attention, Multi-Head Attention, Causal Attention, and Cross-Attention in LLMs A deep-dive implementation of self-attention, multi-head attention, causal attention, and cross-attention — the mechanisms behind modern transformer LLMs.
Feb 7 Understanding Large Language Models -- A Transformative Reading List Curated reading list for understanding large language models, from attention and transformers to BERT, GPT, scaling laws, instruction tuning, and RLHF.
Jan 16 Curated Resources and Trustworthy Experts: The Key Ingredients for Finding Accurate Answers to Technical Questions in the Future Conversational chat bots such as ChatGPT probably will not be able replace traditional search engines and expert knowledge anytime soon. With the vast...

2018

Nov 10 Model evaluation, model selection, and algorithm selection in machine learning Part 4 of the model evaluation series explaining statistical tests, algorithm comparisons, corrected resampled tests, and nested cross-validation.

2016

Oct 2 Model evaluation, model selection, and algorithm selection in machine learning Part 3 of the model evaluation series covering hyperparameter tuning, model selection, validation sets, k-fold cross-validation, and nested workflows.
Aug 13 Model evaluation, model selection, and algorithm selection in machine learning Part 2 of the model evaluation series explaining bootstrap methods, holdout validation, resampling variance, uncertainty estimates, and model stability.
Jun 11 Model evaluation, model selection, and algorithm selection in machine learning Part 1 of a practical model evaluation series covering generalization performance, train-test splits, bias, variance, and supervised learning workflow basics.

2015

Jan 11 Implementing a Weighted Majority Rule Ensemble Classifier Here, I want to present a simple and conservative approach of implementing a weighted majority rule ensemble classifier in scikit-learn that yielded...

2014

Feb 23 Using OpenEye software for substructure alignments This is a quickguide showing how to use OpenEye software command line tools to align target molecules to a query based on substructure matches and how to...