Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get ๐Ÿ“Šstats, ๐Ÿ”ฅheatmaps and ๐Ÿ‘€recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Compared models

Last updated at: 2026-03-12

Metric Grok 4.20 Beta Grok 4.20 Beta medium Release: 2026-03-12 Grok 4.1 Fast Grok 4.1 Fast medium Release: 2025-11-19 Hunter Alpha Hunter Alpha medium Release: Unknown release date
Rank #24 #32 #35
Avg Score 7.0 6.2 5.9
Consistency 9.0 7.9 7.6
Cost per result 5.989 0.563 0.000
Total Cost $0.599 $0.051 $0.000
Tests Correct
Attempt pass rate 70.8% 66.7% 68.8%
Flaky tests 2 4 5
Total Runs 48 48 48
Output Tokens 1,481 1,183 4,686
Reasoning Tokens 86,628 83,875 17,821
Response Time (avg) 8.89s 26.35s 10.71s
Response Time (max) 24.21s 121.79s 30.53s
Response Time (total) 142.18s 237.11s 171.41s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Avg Score vs Response Time (avg)

Total Output Tokens

Avg Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok 4.20 Beta 7.0 7.2 88.9% 1 3.19s 262 6,289
Grok 4.1 Fast 10.0 10.0 100.0% 0 5.65s 102 4,021
Hunter Alpha 7.0 7.2 88.9% 1 4.93s 441 1,003
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok 4.20 Beta 10.0 10.0 100.0% 0 20.93s 227 12,212
Grok 4.1 Fast 10.0 10.0 100.0% 0 37.64s 261 12,272
Hunter Alpha 10.0 1.6 66.7% 1 30.53s 792 3,456
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok 4.20 Beta 9.9 10.0 100.0% 0 4.01s 180 5,281
Grok 4.1 Fast 9.9 10.0 100.0% 0 6.63s 180 5,409
Hunter Alpha 9.9 10.0 100.0% 0 23.16s 1,488 8,017
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok 4.20 Beta 4.0 10.0 33.3% 0 21.33s 251 40,255
Grok 4.1 Fast 4.0 4.4 66.7% 2 121.79s 11 37,657
Hunter Alpha 10.0 10.0 0.0% 0 10.52s 892 2,406
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok 4.20 Beta 10.0 10.0 100.0% 0 5.78s 72 3,440
Grok 4.1 Fast 3.0 9.9 0.0% 0 16.25s 127 3,456
Hunter Alpha 8.0 3.7 66.7% 1 6.44s 116 260
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok 4.20 Beta 9.0 10.0 50.0% 0 4.97s 57 7,107
Grok 4.1 Fast 5.5 10.0 50.0% 0 5.30s 55 3,489
Hunter Alpha 9.5 10.0 100.0% 0 4.18s 208 465
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok 4.20 Beta 7.0 7.2 88.9% 1 3.85s 249 6,660
Grok 4.1 Fast 4.0 7.2 44.4% 1 8.08s 187 6,086
Hunter Alpha 4.3 4.7 66.7% 2 5.36s 441 1,310
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Grok 4.20 Beta 10.0 10.0 0.0% 0 12.39s 183 5,384
Grok 4.1 Fast 10.0 1.6 33.3% 1 27.71s 260 11,485
Hunter Alpha 10.0 10.0 100.0% 0 17.33s 308 904

Quick Compare

Switch Comparison Pair