Hello, I'm Sebastian Raschka
I'm an LLM Research Engineer with over a decade of experience in artificial intelligence. My work bridges academia and industry, with roles including senior staff at an AI company and a statistics professor.
My expertise lies in LLM research and the development of high-performance AI systems, with a deep focus on practical, code-driven implementations. (For my most up-to-date CV details, please visit my LinkedIn profile.)

Recent Notes and Blog Entries

Understanding Reasoning LLMs
Feb 5, 2025
In this article, I will describe the four main approaches to building reasoning models, or how we can enhance LLMs with reasoning capabilities. I hope this provides valuable insights and helps you navigate the rapidly evolving literature and hype surrounding this topic.

Noteworthy LLM Research Papers of 2024
Jan 23, 2025
This article covers 12 influential AI research papers of 2024, ranging from mixture-of-experts models to new LLM scaling laws for precision.

Implementing A Byte Pair Encoding (BPE) Tokenizer From Scratch
Jan 17, 2025
This is a standalone notebook implementing the popular byte pair encoding (BPE) tokenization algorithm, which is used in models like GPT-2 to GPT-4, Llama 3, etc., from scratch for educational purposes."

LLM Research Papers: The 2024 List
Dec 29, 2024
I want to share my running bookmark list of many fascinating (mostly LLM-related) papers I stumbled upon in 2024. It's just a list, but maybe it will come in handy for those who are interested in finding some gems to read for the holidays.

Understanding Multimodal LLMs
Nov 3, 2024
There has been a lot of new research on the multimodal LLM front, including the latest Llama 3.2 vision models, which employ diverse architectural strategies to integrate various data types like text and images. For instance, The decoder-only method uses a single stack of decoder blocks to process all modalities sequentially. On the other hand, cross-attention methods (for example, used in Llama 3.2) involve separate encoders for different modalities with a cross-attention layer that allows these encoders to interact. This article explains how these different types of multimodal LLMs function. Additionally, I will review and summarize roughly a dozen other recent multimodal papers and models published in recent weeks to compare their approaches.