Navigate
AI BENCHY
Compare Charts
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Google: Gemini 3 Flash Preview vs OpenAI: GPT-5.4

Compare:

Last updated at: 2026-03-05

Metric Google: Gemini 3 Flash Preview low Release: 2025-12-17 OpenAI: GPT-5.4 medium Release: 2026-03-05
Rank #8 #7
Avg Score 81 82
Consistency 94 89
Cost per result 0.627 6.533
Total Cost $0.076 $0.784
Tests Correct
Attempt pass rate 82.2% 86.7%
Flaky tests 1 2
Output Tokens 1,466 1,611
Reasoning Tokens 18,969 46,321

Top Models by Score

Score vs Total Cost

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
Google: Gemini 3 Flash Preview 100 100 100.0% 0 275 2,476
OpenAI: GPT-5.4 100 100 100.0% 0 216 1,466
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
Google: Gemini 3 Flash Preview 100 100 0.0% 0 326 0
OpenAI: GPT-5.4 100 100 100.0% 0 301 3,543
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
Google: Gemini 3 Flash Preview 99 100 100.0% 0 279 3,656
OpenAI: GPT-5.4 99 100 100.0% 0 234 804
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
Google: Gemini 3 Flash Preview 40 72 44.4% 1 12 6,410
OpenAI: GPT-5.4 40 72 44.4% 1 61 34,748
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
Google: Gemini 3 Flash Preview 95 100 100.0% 0 71 2,752
OpenAI: GPT-5.4 100 100 100.0% 0 93 897
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
Google: Gemini 3 Flash Preview 100 100 100.0% 0 269 3,260
OpenAI: GPT-5.4 70 72 88.9% 1 442 3,832
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
Google: Gemini 3 Flash Preview 100 100 100.0% 0 234 415
OpenAI: GPT-5.4 100 100 100.0% 0 264 1,031

Quick Compare

Switch Comparison Pair