Artificial Analysis Intelligence Index

This page explains the AA Intelligence Index section shown in the gallery fact sheets.

The gallery copies the displayed Total score from the matching model page on Artificial Analysis. It then uses Profile as gallery shorthand for the four category groups behind that score, based on the same model-page payload plus Artificial Analysis’ published category weights.

Architecture gallery AA methodology AA evaluation page

Source

https://artificialanalysis.ai/

AA Intelligence Profile label

Benchmark-based capability scores for general, scientific reasoning, coding, and agent performance

AA Intelligence Index label

Weighted benchmark score combining the general, scientific reasoning, coding, and agent capability scores

Last pull

2026-07-25

What The Fields Mean

Artificial Analysis defines the Intelligence Index as a composite benchmark score spanning reasoning, knowledge, science, coding, and agentic tasks.

In the gallery, that appears as:

Total score: the copied overall Artificial Analysis Intelligence Index
Profile: the four category groups behind the score

The profile groups are:

general
scientific reasoning
coding
agents

Diagram showing the Artificial Analysis Intelligence Index as four equally weighted capability groups feeding into one combined score — The Intelligence Index is a weighted composite, not a native architecture property like attention type or KV-cache geometry.

How The Score Works

Artificial Analysis publishes the detailed methodology and version history here:

Artificial Analysis updates this methodology over time, so the gallery should treat the total score and profile as a snapshot rather than permanent architectural constants.

If you want to inspect a concrete example, see DeepSeek V3.2 on Artificial Analysis.

For the current gallery pull on 2026-07-25, the profile uses the category structure shown above, with the weighting illustrated in the figure:

Agents: GDPval-AA and ²-Bench Telecom
Coding: Terminal-Bench Hard and SciCode
General: AA-LCR, AA-Omniscience, and IFBench
Scientific reasoning: HLE, GPQA Diamond, and CritPt

At the gallery display level, the split is handled as follows:

Total score is copied directly from the Artificial Analysis model page
Agents and Coding use the category scores exposed on that page
General and Scientific reasoning are derived from the same page’s published benchmark components using Artificial Analysis’ methodology weights
if the required page fields are missing, the gallery shows N/A

Caveats

This is not an architecture-intrinsic metric. Two models with very similar stacks can still have very different scores due to training data, post-training, and reasoning behavior.
Coverage is incomplete. Some gallery models may not have a clean Artificial Analysis score.
Benchmark revisions can move scores over time even if the architecture itself does not change.

Back to architecture gallery