Navigate
AI BENCHY
Your ad here

AI BENCHY Compare

Google: Gemma 4 31B vs Qwen: Qwen3.5 Plus 2026-02-15

Last updated at: 2026-04-02

Metric Gemma 4 31B Gemma 4 31B none Release: 2026-04-02 Qwen3.5 Plus 2026-02-15 Qwen3.5 Plus 2026-02-15 medium Release: 2026-02-15
Score 6.7 8.4
Rank #47 #11
Consistency 10.0 9.0
Tests Correct
Attempt pass rate 52.9% 82.4%
Flaky tests 0 2
Total Runs 51 51
Cost per result 0.023 1.448
Total Cost $0.002 $0.189
Input Price $0.140 / 1M $0.260 / 1M
Output Price $0.400 / 1M $1.560 / 1M
Output Tokens 660 1,754
Reasoning Tokens 0 92,522
Response Time (avg) 2.55s 39.13s
Response Time (max) 4.68s 81.20s
Response Time (total) 38.20s 391.29s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 6.5 10.0 50.0% 0 1.85s 45 0
Qwen3.5 Plus 2026-02-15 8.2 7.9 83.3% 1 45.78s 205 21,236
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0
Qwen3.5 Plus 2026-02-15 10.0 10.0 100.0% 0 46.85s 421 7,906
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 10.0 10.0 100.0% 0 2.25s 285 0
Qwen3.5 Plus 2026-02-15 10.0 10.0 100.0% 0 46.91s 270 14,916
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 7.7 10.0 66.7% 0 3.22s 27 0
Qwen3.5 Plus 2026-02-15 5.3 10.0 33.3% 0 17.50s 35 16,680
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 10.0 10.0 100.0% 0 2.09s 117 0
Qwen3.5 Plus 2026-02-15 4.7 1.6 66.7% 1 79.86s 73 8,675
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 6.5 10.0 50.0% 0 2.84s 78 0
Qwen3.5 Plus 2026-02-15 10.0 10.0 100.0% 0 31.93s 101 7,704
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 5.5 10.0 33.3% 0 2.95s 108 0
Qwen3.5 Plus 2026-02-15 10.0 10.0 100.0% 0 34.57s 340 14,496
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0
Qwen3.5 Plus 2026-02-15 10.0 10.0 100.0% 0 7.54s 309 909

Quick Compare

Switch Comparison Pair