Skip to main content Ahead of AI logo
Sebastian Raschka
Twitter/X icon LinkedIn Icon GitHub icon

    Home Ahead of AI Logo Blog Books Courses LLM Gallery LLMs From Scratch Reasoning Models Talks
    Blog Archive Quick Paper and Model Notes Research About

    Quick Paper and Model Notes

    • Jun 17, 2026 Quick Model Note
      VibeThinker-3B and the Strength of Post-Training

      Short note on VibeThinker-3B, a 3B model based on Qwen2.5-Coder-3B whose reported coding and reasoning results point to strong post-training.

      Substack Note
    • Jun 12, 2026 Quick Model Note
      North Mini Code and Agentic Coding Benchmarks

      Short note on North Mini Code, Cohere's 30B total and 3B active open-weight MoE model for agentic coding tasks.

      Substack Note
    • Jun 4, 2026 Quick Architecture Note
      Nemotron 3 Ultra and Latent MoE Scaling

      Short note on Nemotron 3 Ultra, NVIDIA's 550B total and 55B active hybrid Mamba-Transformer Latent MoE model.

      Substack Note
    • May 27, 2026 Quick Paper Note
      MiniMax M2 and Production-Oriented Model Design

      Short note on the MiniMax-M2 technical report, including full attention, fine-grained MoE, agent pipelines, speed rewards, and self-evolution.

      Substack Note
    • May 23, 2026 Quick Architecture Note
      DeepSeek Sparse Attention From Scratch

      Short note on a DeepSeek Sparse Attention from-scratch implementation added to the LLMs-from-scratch repository.

      Substack Note
    • May 14, 2026 Quick Architecture Note
      Implementing LLM Architectures From Scratch

      Short note linking a talk on implementing LLM architectures from scratch and comparing new open-weight model implementations against references.

      Substack Note
    • Apr 2, 2026 Quick Benchmark Note
      Gemma 4 Architecture and Benchmark Notes

      Short note on Gemma 4 31B, including its local-global attention recipe, benchmark jump over Gemma 3, and Apache 2.0 release.

      Substack Note
    • Mar 26, 2026 Quick Architecture Note
      LLM Architecture Gallery Diff Tool

      Short note on the LLM Architecture Gallery diff tool for comparing two model architecture stacks side by side.

      Substack Note
    • Mar 12, 2026 Quick Model Note
      Nemotron 3 Super Throughput Notes

      Short note on NVIDIA Nemotron 3 Super 120B-A12B, a hybrid Mamba-Transformer MoE model with latent experts and shared-weight MTP.

      Substack Note

    © 2013-2026 Sebastian Raschka