Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Google: Gemini 2.5 Flash vs OpenAI: GPT-5.4 Mini

Summary

Gemini 2.5 Flash vs GPT-5.4 Mini benchmark comparison: Gemini 2.5 Flash leads on average score with 8.2 vs 8.0. Gemini 2.5 Flash has the lower benchmark cost at $0.379 vs $0.526. Gemini 2.5 Flash is faster at 15.49s vs 22.34s, with pass rates of 69.8% vs 73.0%.

Recommended model: Gemini 2.5 Flash - It has the strongest score in this comparison (8.2) and the best overall balance of cost and response time across all 2 models.

Last updated at: 2026-06-12

Metric Gemini 2.5 Flash Gemini 2.5 Flash medium Release: 2025-06-17 GPT-5.4 Mini GPT-5.4 Mini medium Release: 2026-03-17
Score 8.2 8.0
Rank #27 #30
Reliability 10.0 10.0
Consistency 9.6 8.0
Tests Correct
Attempt pass rate 69.8% 73.0%
Flaky tests 1 5
Total Runs 63 63
Cost per result 2.701 4.381
Total Cost $0.379 $0.526
Input Price $0.300 / 1M $0.750 / 1M
Output Price $2.500 / 1M $4.500 / 1M
Total Input Tokens 34,476 34,116
Output Tokens 1,930 2,181
Reasoning Tokens 145,145 108,937
Response Time (avg) 15.49s 22.34s
Response Time (max) 95.48s 138.75s
Response Time (total) 325.39s 469.20s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#27 Gemini 2.5 Flash

medium
Invalid SVG
Cost
$0.000
Time
274.0s
Tokens
0 tok

#30 GPT-5.4 Mini

medium
Cost
$0.056
Time
95.5s
Tokens
12,464 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 2.5 Flash 8.4 10.0 75.0% 0 6.30s 492 255 10,233
GPT-5.4 Mini 8.6 7.9 91.7% 1 4.05s 606 296 2,876
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 2.5 Flash 7.8 10.0 66.7% 0 41.01s 6,669 543 32,303
GPT-5.4 Mini 8.4 7.4 88.9% 1 57.87s 7,305 467 40,902
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 2.5 Flash 10.0 10.0 100.0% 0 28.44s 12,522 303 11,922
GPT-5.4 Mini 10.0 10.0 100.0% 0 17.81s 11,019 317 4,317
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 2.5 Flash 10.0 10.0 100.0% 0 4.06s 7,257 279 2,325
GPT-5.4 Mini 10.0 10.0 100.0% 0 2.43s 7,140 234 650
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 2.5 Flash 5.9 7.2 55.6% 1 37.34s 633 18 80,702
GPT-5.4 Mini 4.1 4.4 44.5% 2 65.31s 619 60 43,286
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 2.5 Flash 4.8 10.0 0.0% 0 4.86s 486 92 1,899
GPT-5.4 Mini 4.5 10.0 0.0% 0 3.72s 477 150 510
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 2.5 Flash 9.8 10.0 100.0% 0 2.62s 615 69 1,203
GPT-5.4 Mini 9.8 10.0 100.0% 0 2.13s 660 96 1,185
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 2.5 Flash 7.7 10.0 66.7% 0 3.18s 558 126 2,499
GPT-5.4 Mini 7.8 10.0 66.7% 0 4.37s 642 278 2,443
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 2.5 Flash 10.0 10.0 100.0% 0 6.20s 5,088 234 1,140
GPT-5.4 Mini 4.7 1.6 66.7% 1 9.62s 5,453 251 2,594
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 2.5 Flash 3.0 10.0 0.0% 0 2.76s 156 11 919
GPT-5.4 Mini 3.0 10.0 0.0% 0 30.10s 195 32 10,174

Quick Compare

Switch Comparison Pair