Navigate
AI BENCHY
Your ad here

AI BENCHY Compare

Anthropic: Claude Sonnet 4.6 vs Grok 4.20 Beta

Last updated at: 2026-04-04

Metric Claude Sonnet 4.6 Claude Sonnet 4.6 medium Release: 2026-02-17 Grok 4.20 Beta Grok 4.20 Beta medium Release: 2026-03-12
Score 7.9 7.9
Rank #25 #27
Consistency 9.5 9.0
Tests Correct
Attempt pass rate 72.6% 72.6%
Flaky tests 1 2
Total Runs 51 51
Cost per result 8.531 5.525
Total Cost $1.024 $0.608
Input Price $3.000 / 1M $0.000 / 1M
Output Price $15.000 / 1M $0.000 / 1M
Output Tokens 35,174 1,487
Reasoning Tokens 24,687 87,922
Response Time (avg) 10.09s 8.54s
Response Time (max) 46.35s 24.21s
Response Time (total) 90.85s 145.26s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Sonnet 4.6 6.5 10.0 50.0% 0 2.98s 1,046 1,093
Grok 4.20 Beta 8.7 7.9 91.7% 1 3.16s 268 7,583
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 46.35s 5,871 3,962
Grok 4.20 Beta 10.0 10.0 100.0% 0 20.93s 227 12,212
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 13.90s 649 742
Grok 4.20 Beta 10.0 10.0 100.0% 0 4.01s 180 5,281
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Sonnet 4.6 2.9 7.2 11.1% 1 0ms 25,790 16,919
Grok 4.20 Beta 5.3 10.0 33.3% 0 21.33s 251 40,255
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 4.94s 256 433
Grok 4.20 Beta 10.0 10.0 100.0% 0 5.78s 72 3,440
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 2.61s 318 552
Grok 4.20 Beta 8.3 10.0 50.0% 0 4.97s 57 7,107
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 4.80s 589 635
Grok 4.20 Beta 8.2 7.2 88.9% 1 3.85s 249 6,660
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Sonnet 4.6 10.0 10.0 100.0% 0 7.48s 655 351
Grok 4.20 Beta 3.0 10.0 0.0% 0 12.39s 183 5,384

Quick Compare

Switch Comparison Pair