Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Google: Gemini 3.5 Flash vs OpenAI: GPT-5.4

Summary

Gemini 3.5 Flash vs GPT-5.4 benchmark comparison: GPT-5.4 leads on average score with 8.5 vs 7.0. Gemini 3.5 Flash has the lower benchmark cost at $1.079 vs $1.210. Gemini 3.5 Flash is faster at 9.93s vs 22.35s, with pass rates of 77.8% vs 76.2%.

Recommended model: Gemini 3.5 Flash - It offers the best overall trade-off: a competitive score (7.0), lower cost than GPT-5.4, and balanced response time.

Last updated at: 2026-06-12

Metric Gemini 3.5 Flash Gemini 3.5 Flash none Release: 2026-05-19 GPT-5.4 GPT-5.4 medium Release: 2026-03-05
Score 7.0 8.5
Rank #66 #20
Reliability 10.0 10.0
Consistency 8.9 8.6
Tests Correct
Attempt pass rate 77.8% 76.2%
Flaky tests 3 4
Total Runs 63 63
Cost per result 7.190 8.640
Total Cost $1.079 $1.210
Input Price $1.500 / 1M $2.500 / 1M
Output Price $9.000 / 1M $15.000 / 1M
Total Input Tokens 13,843 34,108
Output Tokens 117,518 2,242
Reasoning Tokens 0 72,707
Response Time (avg) 9.93s 22.35s
Response Time (max) 64.36s 100.41s
Response Time (total) 178.68s 469.29s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#66 Gemini 3.5 Flash

none
Cost
$0.225
Time
125.5s
Tokens
25,004 tok

#20 GPT-5.4

medium
Cost
$0.214
Time
199.6s
Tokens
14,349 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 2.53s 492 5,101 0
GPT-5.4 8.3 10.0 75.0% 0 4.11s 606 240 1,511
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 8.8 7.8 88.9% 1 34.69s 8,122 75,927 0
GPT-5.4 8.8 7.8 88.9% 1 44.36s 7,305 433 24,216
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 3.0 10.0 0.0% 0 0ms 0 0 0
GPT-5.4 10.0 10.0 100.0% 0 20.57s 11,019 301 3,543
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 6.5 10.0 50.0% 0 8.10s 2,781 5,895 0
GPT-5.4 10.0 10.0 100.0% 0 5.32s 7,140 234 804
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 7.6 7.2 77.8% 1 10.64s 633 17,910 0
GPT-5.4 5.3 7.2 44.4% 1 74.27s 619 61 34,748
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 3.46s 486 1,620 0
GPT-5.4 4.7 3.1 33.3% 1 4.92s 477 145 321
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 9.8 10.0 100.0% 0 3.38s 615 3,928 0
GPT-5.4 10.0 10.0 100.0% 0 3.11s 660 93 897
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 3.13s 558 4,640 0
GPT-5.4 8.2 7.2 88.9% 1 9.14s 642 441 3,815
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 3.0 10.0 0.0% 0 0ms 0 0 0
GPT-5.4 10.0 10.0 100.0% 0 13.28s 5,445 264 1,031
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 2.8 1.6 33.3% 1 4.87s 156 2,497 0
GPT-5.4 3.0 10.0 0.0% 0 13.95s 195 30 1,821

Quick Compare

Switch Comparison Pair