North Mini Code and Agentic Coding Benchmarks

North Mini Code is a new open-weight model by Cohere for agentic coding tasks.

Based on the release post, it is a 30B-parameter Mixture-of-Experts model with 3B active parameters, available under Apache 2.0. Architecturally, the interesting part is the 30B-A3B tradeoff, with 128 experts, 8 active experts per token, and interleaved sliding-window and global attention.

The important detail is the evaluation setup. The release emphasizes agentic coding, where the model has to work inside a tool loop instead of only returning a code answer for a prompt:

On Terminal-Bench, the model has to use a terminal, inspect the environment, run commands, read outputs, and continue from the observed state.
On SWE-Bench, the model works on GitHub-style software issues. It has to understand the repository, find relevant files, make a patch, and pass tests.
SciCode and LiveCodeBench are closer to traditional code-generation benchmarks. They still require reasoning, but the interaction loop is much shorter.

That focus on agentic coding is probably why North Mini Code looks far ahead of Gemma 4 on the workflow-heavy rows in the table. The more traditional code-generation rows are still competitive, although not quite at Qwen3.6 level.

As usual, I would treat these as release-time benchmark numbers as of June 12, 2026. For agentic coding, harness details, tool APIs, timeouts, and prompt templates can move results substantially.

North Mini Code architecture and benchmark overview — Composite figure from the original Substack note, summarizing the North Mini Code architecture and a release-time benchmark snapshot.

Source: lightly edited website version of my Substack note.

Read Next