Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Google: Gemma 4 31B vs OpenAI: GPT-5.3-Codex

Last updated at: 2026-05-22

Metric Gemma 4 31B Gemma 4 31B none Release: 2026-04-02 Free Available GPT-5.3-Codex GPT-5.3-Codex medium Release: 2026-02-05
Score 6.7 8.3
Rank #76 #15
Reliability 10.0 10.0
Consistency 10.0 8.4
Tests Correct
Attempt pass rate 50.0% 81.7%
Flaky tests 0 4
Total Runs 60 60
Cost per result 0.030 4.891
Total Cost $0.003 $0.685
Input Price $0.120 / 1M $1.750 / 1M
Output Price $0.370 / 1M $14.000 / 1M
Output Tokens 1,398 2,332
Reasoning Tokens 0 42,616
Response Time (avg) 3.84s 15.97s
Response Time (max) 26.13s 100.93s
Response Time (total) 69.13s 319.30s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 6.5 10.0 50.0% 0 1.85s 45 0
GPT-5.3-Codex 8.7 7.9 91.7% 1 4.16s 240 1,722
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 6.8 10.0 50.0% 0 14.84s 726 0
GPT-5.3-Codex 10.0 10.0 100.0% 0 18.45s 514 7,266
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0
GPT-5.3-Codex 10.0 10.0 100.0% 0 19.56s 364 2,731
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 10.0 10.0 100.0% 0 2.25s 285 0
GPT-5.3-Codex 10.0 10.0 100.0% 0 3.07s 234 728
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 7.7 10.0 66.7% 0 3.22s 27 0
GPT-5.3-Codex 5.9 7.2 55.6% 1 64.31s 64 25,308
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 10.0 10.0 100.0% 0 2.09s 117 0
GPT-5.3-Codex 4.6 10.0 0.0% 0 4.87s 187 331
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 6.5 10.0 50.0% 0 2.84s 78 0
GPT-5.3-Codex 10.0 10.0 100.0% 0 3.04s 93 693
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 6.5 10.0 33.3% 0 2.95s 108 0
GPT-5.3-Codex 9.0 7.9 88.9% 1 5.12s 352 1,644
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 3.0 10.0 0.0% 0 0ms 0 0
GPT-5.3-Codex 10.0 10.0 100.0% 0 6.37s 254 492
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemma 4 31B 3.0 10.0 0.0% 0 1.25s 12 0
GPT-5.3-Codex 2.8 1.6 33.3% 1 14.43s 30 1,701

Quick Compare

Switch Comparison Pair