Hello, I'm Sebastian Raschka, PhD
I am an LLM Research Engineer with over a decade of experience in artificial intelligence. My work bridges academia and industry, including roles as senior engineer at Lightning AI and as a statistics professor at the University of Wisconsin-Madison.
I am also the author of Build a Large Language Model (From Scratch).
My expertise lies in LLM research and the development of high-performance AI systems, with a deep focus on practical, code-driven implementations. (For my most up-to-date CV details, please visit my LinkedIn profile.)
Recent Articles and Notes
Jun 6, 2026
A curated roundup of notable LLM research papers that came out this year
May 16, 2026
From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs
Apr 18, 2026
A learning-oriented workflow for understanding new open-weight model releases
Apr 4, 2026
How coding agents use tools, memory, and repo context to make LLMs work better in practice
Short note on VibeThinker-3B, a 3B model based on Qwen2.5-Coder-3B whose reported coding and reasoning results point to strong post-train...
Short note on North Mini Code, Cohere's 30B total and 3B active open-weight MoE model for agentic coding tasks.
Short note on Nemotron 3 Ultra, NVIDIA's 550B total and 55B active hybrid Mamba-Transformer Latent MoE model.
Short note on the MiniMax-M2 technical report, including full attention, fine-grained MoE, agent pipelines, speed rewards, and self-evolu...
Short note on a DeepSeek Sparse Attention from-scratch implementation added to the LLMs-from-scratch repository.