GLM-5.2 is a recent open-weight model release from Z.ai. My first impression is that it is the best open-weight model today. As usual for fresh releases, I would treat the release-time leaderboard position as date-sensitive.

Architecture-wise, it builds on the earlier GLM-5 and GLM-5.1 architecture. In particular, it reuses Multi-head Latent Attention and DeepSeek Sparse Attention, the DSA mechanism from DeepSeek V3.2 that I covered in the DeepSeek V3 to V3.2 article.

What’s new is IndexShare. This is a cross-layer reuse trick for DSA. Instead of recomputing the sparse-attention top-k indexer in every layer, GLM-5.2 runs the full indexer only once every four layers. The following layers then reuse the selected token indices.

This keeps the same DSA idea but makes 1M-token inference cheaper. The attention pattern is still adaptive, but the model spends less work repeatedly deciding which earlier tokens to attend to.

The local GLM-5.2 architecture card has the current summary, config links, and benchmark references.

GLM-5.2 architecture and benchmark overview

Composite figure from the original Substack note, summarizing the GLM-5.2 architecture and release-time benchmark snapshot.

Source: lightly edited website version of my Substack note.