Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Anthropic: Claude Sonnet 5 vs Google: Gemma 4 31B

Summary

Claude Sonnet 5 vs Gemma 4 31B benchmark comparison: Gemma 4 31B leads on average score with 6.1 vs 5.7. Gemma 4 31B has the lower benchmark cost at $0.004 vs $0.287. Gemma 4 31B is faster at 4.05s vs 4.74s, with pass rates of 42.9% vs 47.6%.

Recommended model: Gemma 4 31B - It has the best score here (6.1), while costing about 95.6x less than Claude Sonnet 5.

Last updated at: 2026-06-30

Metric Claude Sonnet 5 Claude Sonnet 5 none Release: 2026-06-30 Gemma 4 31B Gemma 4 31B none Release: 2026-04-02 Free Available
Score 5.7 6.1
Rank #117 #100
Reliability 10.0 10.0
Consistency 8.6 10.0
Tests Correct
Attempt pass rate 42.9% 47.6%
Flaky tests 4 0
Total Runs 63 63
Cost per result 4.098 0.034
Total Cost $0.287 $0.004
Input Price $2.000 / 1M $0.120 / 1M
Output Price $10.000 / 1M $0.350 / 1M
Total Input Tokens 76,797 20,911
Output Tokens 13,325 1,407
Reasoning Tokens 0 0
Response Time (avg) 4.74s 4.05s
Response Time (max) 29.46s 26.13s
Response Time (total) 99.46s 76.87s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#117 Claude Sonnet 5

none
Cost
$0.061
Time
53.7s
Tokens
6,172 tok

#100 Gemma 4 31B

none
Cost
$0.001
Time
12.8s
Tokens
795 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 5.3 10.0 25.0% 0 3.60s 834 1,813 0
Gemma 4 31B 6.5 10.0 50.0% 0 1.85s 852 45 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 4.6 7.9 22.2% 1 3.67s 10,590 1,864 0
Gemma 4 31B 5.5 10.0 33.3% 0 11.19s 8,381 735 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 3.0 10.0 0.0% 0 29.46s 38,775 6,340 0
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 10.0 10.0 100.0% 0 3.01s 10,503 309 0
Gemma 4 31B 10.0 10.0 100.0% 0 2.25s 8,352 285 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 5.3 7.2 44.4% 1 3.28s 975 933 0
Gemma 4 31B 7.7 10.0 66.7% 0 3.22s 903 27 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 4.7 3.1 33.3% 1 2.81s 708 272 0
Gemma 4 31B 10.0 10.0 100.0% 0 2.09s 576 117 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 6.4 10.0 50.0% 0 2.58s 909 103 0
Gemma 4 31B 6.5 10.0 50.0% 0 2.84s 795 78 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 6.0 7.4 55.6% 1 3.22s 894 778 0
Gemma 4 31B 6.5 10.0 33.3% 0 4.23s 828 108 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 10.0 10.0 100.0% 0 6.80s 12,351 522 0
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 3.0 10.0 0.0% 0 4.31s 258 391 0
Gemma 4 31B 3.0 10.0 0.0% 0 1.25s 224 12 0

Quick Compare

Switch Comparison Pair