Artificial Analysis Intelligence Index
This page explains the AA Intelligence Index section shown in the gallery fact sheets.
The gallery copies the displayed Total score from the matching model page on Artificial Analysis. It then uses Profile as gallery shorthand for the four category groups behind that score, based on the same model-page payload plus Artificial Analysis’ published category weights.
AA Intelligence Profile label
Benchmark-based capability scores for general, scientific reasoning, coding, and agent performance
AA Intelligence Index label
Weighted benchmark score combining the general, scientific reasoning, coding, and agent capability scores
Last pull
2026-03-27
What The Fields Mean
Artificial Analysis defines the Intelligence Index as a composite benchmark score spanning reasoning, knowledge, science, coding, and agentic tasks.
In the gallery, that appears as:
-
Total score: the copied overall Artificial Analysis Intelligence Index -
Profile: the four category groups behind the score
The profile groups are:
- general
- scientific reasoning
- coding
- agents
How The Score Works
Artificial Analysis publishes the detailed methodology and version history here:
Artificial Analysis updates this methodology over time, so the gallery should treat the total score and profile as a snapshot rather than permanent architectural constants.
If you want to inspect a concrete example, see DeepSeek V3.2 on Artificial Analysis.
For the current gallery pull on 2026-03-27, the profile uses the category structure shown above, with the weighting illustrated in the figure:
-
Agents: GDPval-AA and ²-Bench Telecom -
Coding: Terminal-Bench Hard and SciCode -
General: AA-LCR, AA-Omniscience, and IFBench -
Scientific reasoning: HLE, GPQA Diamond, and CritPt
At the gallery display level, the split is handled as follows:
-
Total scoreis copied directly from the Artificial Analysis model page -
AgentsandCodinguse the category scores exposed on that page -
GeneralandScientific reasoningare derived from the same page’s published benchmark components using Artificial Analysis’ methodology weights - if the required page fields are missing, the gallery shows
N/A
Caveats
- This is not an architecture-intrinsic metric. Two models with very similar stacks can still have very different scores due to training data, post-training, and reasoning behavior.
- Coverage is incomplete. Some gallery models may not have a clean Artificial Analysis score.
- Benchmark revisions can move scores over time even if the architecture itself does not change.