sebastianraschka.com/llm-architecture-gallery/

Percent Active Parameters per Token

#	Model	Active %	Active params	Total params	Type	Date	Attention
1	DeepSeek V4-Pro	3.1%	49B active	1.6T	MoE	2026-04-24	CSA/HCA
2	Kimi K2	3.2%	32B active	1T	MoE	2025-07-10	MLA
3	Kimi K2.5	3.2%	32B active	1T	MoE	2026-01-27	MLA
4	Kimi K2.6	3.2%	32B active	1T	MoE	2026-04-20	MLA
5	Arcee AI Trinity Large 400B	3.3%	13B active	400B	MoE	2026-01-27	3:1 sliding-window/global gated GQA
6	Qwen3 Next 80B-A3B	3.8%	3B active	80B	Hybrid	2025-09-09	3:1 Gated DeltaNet and Gated Attention
7	Xiaomi MiMo-V2.5-Pro 1.02T	4.1%	42B active	1.02T	MoE	2026-04-22	GQA with 6:1 sliding-window/global attention
8	Llama 4 Maverick	4.3%	17B active	400B	MoE	2025-04-05	GQA
9	MiniMax M2 230B	4.3%	10B active	230B	MoE	2025-10-23	GQA
10	MiniMax M2.5 230B	4.3%	10B active	230B	MoE	2026-02-12	GQA
11	Qwen3.5 397B	4.3%	17B active	397B	Hybrid	2026-02-16	3:1 Gated DeltaNet and Gated Attention
12	MiniMax M2.7 230B	4.3%	10B active	230B	MoE	2026-03-18	GQA
13	GPT-OSS 120B	4.4%	5.1B active	117B	MoE	2025-08-04	Alternating sliding-window/global GQA
14	LongCat-Flash-Lite 68.5B-A3B	4.4%	3B active	68.5B	MoE	2026-01-28	MLA
15	DeepSeek V4-Flash	4.6%	13B active	284B	MoE	2026-04-24	CSA/HCA
16	Xiaomi MiMo-V2.5 310B	4.8%	15B active	310B	MoE	2026-04-22	5:1 sliding-window/global attention
17	Xiaomi MiMo-V2-Flash 309B	4.9%	15B active	309B	MoE	2025-12-16	5:1 sliding-window/global attention
18	GLM-5 744B	5.4%	40B active	744B	MoE	2026-02-11	MLA with DeepSeek Sparse Attention
19	GLM-5.1	5.4%	40B active	744B	MoE	2026-04-07	MLA with DeepSeek Sparse Attention
20	DeepSeek V3	5.5%	37B active	671B	MoE	2024-12-26	MLA
21	DeepSeek R1	5.5%	37B active	671B	MoE	2025-01-20	MLA
22	DeepSeek V3.2	5.5%	37B active	671B	MoE	2025-12-01	MLA with DeepSeek Sparse Attention
23	Step 3.5 Flash 196B	5.6%	11B active	196B	MoE	2026-02-01	3:1 sliding-window GQA
24	Mistral Small 4	5.6%	6.63B active	119B	MoE	2026-03-16	MLA
25	Mistral Large 3	6.1%	41B active	673B	MoE	2025-12-02	MLA
26	Kimi Linear 48B-A3B	6.3%	3B active	48B	Hybrid	2025-10-30	3:1 Kimi Delta Attention and MLA
27	Ling 2.5 1T	6.3%	63B active	1T	Hybrid	2026-02-15	Lightning Attention plus MLA
28	Ling 2.6 1T	6.3%	63B active	1T	Hybrid	2026-04-23	Lightning Attention plus MLA
29	Tencent Hy3-preview 295B-A21B	7.1%	21B active	295B	MoE	2026-04-23	GQA
30	Sarvam 30B	8%	2.4B active	30B	MoE	2026-03-03	GQA
31	Qwen3.6 35B-A3B	8.6%	3B active	35B	Hybrid	2026-04-15	3:1 Gated DeltaNet and Gated Attention
32	GLM-4.5 355B	9%	32B active	355B	MoE	2025-07-28	GQA
33	GLM-4.7 355B	9%	32B active	355B	MoE	2025-12-22	GQA
34	ZAYA1-8B	9%	760M active	8.4B	MoE	2026-05-06	CCA with 4:1 GQA
35	Laguna XS.2	9.1%	3B active	33B	MoE	2026-04-28	3:1 sliding-window/global gated GQA
36	Qwen3 235B-A22B	9.4%	22B active	235B	MoE	2025-04-28	GQA
37	Sarvam 105B	9.8%	10.3B active	105B	MoE	2026-03-03	MLA
38	Qwen3 30B-A3B	10%	3B active	30B	MoE	2025-04-28	GQA
39	Nemotron 3 Nano 30B-A3B	10%	3B active	30B	Hybrid MoE	2025-12-04	Mamba-2 + GQA
40	Nemotron 3 Super 120B-A12B	10%	12B active	120B	Hybrid MoE	2026-03-11	Mamba-2 + GQA
41	Qwen3 Coder Flash 30B-A3B	11%	3.3B active	30B	MoE	2025-07-31	GQA
42	GLM-4.5-Air	11.3%	12B active	106B	MoE	2025-07-28	GQA
43	INTELLECT-3	11.3%	12B active	106B	MoE	2025-11-26	GQA
44	Command A+ 218B-A25B	11.5%	25B active	218B	MoE	2026-05-20	16:1 GQA with 3:1 sliding-window/global attention
45	Gemma 4 26B-A4B	15.1%	3.8B active	25.2B	MoE	2026-04-02	5:1 sliding-window/global GQA
46	GPT-OSS 20B	17.1%	3.6B active	21B	MoE	2025-08-04	Alternating sliding-window/global GQA
47	LFM2.5 8B-A1B	18.1%	1.5B active	8.3B	Hybrid MoE	2026-05-28	LIV convolution blocks plus GQA and MoE
48	JetBrains Mellum2 Thinking 12B-A2.5B	20.8%	2.5B active	12B	MoE	2026-06-01	3:1 sliding-window/full GQA

Caveat: active parameter share is only one lens. It does not capture KV cache size, attention pattern, context length, routing overhead, hardware efficiency, or training quality. But it is a helpful quick check when comparing sparse models.