Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Anthropic: Claude Sonnet 5 vs Google: Gemma 4 31B

Summary

Claude Sonnet 5 vs Gemma 4 31B benchmark comparison: Gemma 4 31B leads on average score with 6.3 vs 5.7. Gemma 4 31B has the lower benchmark cost at $0.033 vs $0.287. Claude Sonnet 5 is faster at 4.74s vs 56.55s, with pass rates of 42.9% vs 69.8%.

Recommended model: Gemma 4 31B - It has the best score here (6.3), while costing about 8.8x less than Claude Sonnet 5.

Last updated at: 2026-06-30

Metric Claude Sonnet 5 Claude Sonnet 5 none Release: 2026-06-30 Gemma 4 31B Gemma 4 31B medium Release: 2026-04-02 Free Available
Score 5.7 6.3
Rank #117 #90
Reliability 10.0 10.0
Consistency 8.6 9.4
Tests Correct
Attempt pass rate 42.9% 69.8%
Flaky tests 4 1
Total Runs 63 63
Cost per result 4.098 0.257
Total Cost $0.287 $0.033
Input Price $2.000 / 1M $0.120 / 1M
Output Price $10.000 / 1M $0.350 / 1M
Total Input Tokens 76,797 17,957
Output Tokens 13,325 22,356
Reasoning Tokens 0 65,726
Response Time (avg) 4.74s 56.55s
Response Time (max) 29.46s 437.40s
Response Time (total) 99.46s 1074.41s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#117 Claude Sonnet 5

none
Cost
$0.061
Time
53.7s
Tokens
6,172 tok

#90 Gemma 4 31B

medium
Cost
$0.002
Time
45.7s
Tokens
2,696 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 5.3 10.0 25.0% 0 3.60s 834 1,813 0
Gemma 4 31B 10.0 10.0 100.0% 0 12.89s 816 962 2,046
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 4.6 7.9 22.2% 1 3.67s 10,590 1,864 0
Gemma 4 31B 4.3 5.8 22.2% 1 219.76s 5,568 11,098 33,212
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 3.0 10.0 0.0% 0 29.46s 38,775 6,340 0
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 10.0 10.0 100.0% 0 3.01s 10,503 309 0
Gemma 4 31B 10.0 10.0 100.0% 0 21.11s 8,334 1,822 2,951
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 5.3 7.2 44.4% 1 3.28s 975 933 0
Gemma 4 31B 7.7 10.0 66.7% 0 38.48s 876 4,349 8,985
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 4.7 3.1 33.3% 1 2.81s 708 272 0
Gemma 4 31B 10.0 10.0 100.0% 0 9.57s 567 105 888
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 6.4 10.0 50.0% 0 2.58s 909 103 0
Gemma 4 31B 10.0 10.0 100.0% 0 12.76s 777 533 2,035
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 6.0 7.4 55.6% 1 3.22s 894 778 0
Gemma 4 31B 9.9 10.0 100.0% 0 26.91s 801 1,795 5,595
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 10.0 10.0 100.0% 0 6.80s 12,351 522 0
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 3.0 10.0 0.0% 0 4.31s 258 391 0
Gemma 4 31B 3.0 10.0 0.0% 0 90.14s 218 1,692 10,014

Quick Compare

Switch Comparison Pair