Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

DeepSeek: DeepSeek V4 Pro vs Google: Gemma 4 31B

Summary

DeepSeek V4 Pro vs Gemma 4 31B benchmark comparison: Gemma 4 31B leads on average score with 6.3 vs 6.2. DeepSeek V4 Pro has the lower benchmark cost at $0.025 vs $0.033. DeepSeek V4 Pro is faster at 12.38s vs 56.55s, with pass rates of 42.9% vs 69.8%.

Recommended model: Gemma 4 31B - It has the strongest score in this comparison (6.3) and the best overall balance of cost and response time across all 2 models.

Last updated at: 2026-06-12

Metric DeepSeek V4 Pro DeepSeek V4 Pro none Release: 2026-04-24 Gemma 4 31B Gemma 4 31B medium Release: 2026-04-02 Free Available
Score 6.2 6.3
Rank #91 #87
Reliability 8.5 10.0
Consistency 8.5 9.4
Tests Correct
Attempt pass rate 42.9% 69.8%
Flaky tests 4 1
Total Runs 63 63
Cost per result 0.660 0.257
Total Cost $0.025 $0.033
Input Price $0.435 / 1M $0.120 / 1M
Output Price $0.870 / 1M $0.350 / 1M
Total Input Tokens 44,845 17,957
Output Tokens 5,349 22,356
Reasoning Tokens 0 65,726
Response Time (avg) 12.38s 56.55s
Response Time (max) 58.65s 437.40s
Response Time (total) 260.06s 1074.41s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#91 DeepSeek V4 Pro

none
Invalid SVG
Cost
$0.000
Time
300.0s
Tokens
0 tok

#87 Gemma 4 31B

medium
Cost
$0.002
Time
45.7s
Tokens
2,696 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 3.5 8.0 16.7% 1 14.02s 540 704 0
Gemma 4 31B 10.0 10.0 100.0% 0 12.89s 816 962 2,046
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 4.6 7.9 22.2% 1 6.11s 7,279 531 0
Gemma 4 31B 4.3 5.8 22.2% 1 219.76s 5,568 11,098 33,212
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 9.5 10.0 100.0% 0 25.49s 20,773 1,911 0
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 6.9 5.8 66.7% 1 30.54s 5,633 170 0
Gemma 4 31B 10.0 10.0 100.0% 0 21.11s 8,334 1,822 2,951
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 5.3 10.0 33.3% 0 3.17s 666 18 0
Gemma 4 31B 7.7 10.0 66.7% 0 38.48s 876 4,349 8,985
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 4.3 9.9 0.0% 0 3.75s 471 132 0
Gemma 4 31B 10.0 10.0 100.0% 0 9.57s 567 105 888
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 6.3 10.0 50.0% 0 8.23s 627 64 0
Gemma 4 31B 10.0 10.0 100.0% 0 12.76s 777 533 2,035
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 7.6 7.2 77.8% 1 15.95s 594 173 0
Gemma 4 31B 9.9 10.0 100.0% 0 26.91s 801 1,795 5,595
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 10.0 10.0 100.0% 0 5.92s 8,079 219 0
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 3.0 10.0 0.0% 0 15.59s 183 1,427 0
Gemma 4 31B 3.0 10.0 0.0% 0 90.14s 218 1,692 10,014

Quick Compare

Switch Comparison Pair