Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Google: Gemini 3.5 Flash vs OpenAI: GPT-5.5

Last updated at: 2026-05-19

Metric Gemini 3.5 Flash Gemini 3.5 Flash low Release: 2026-05-19 GPT-5.5 GPT-5.5 low Release: 2026-04-24
Score 9.6 8.9
Rank #2 #10
Reliability 10.0 10.0
Consistency 10.0 10.0
Tests Correct
Attempt pass rate 94.7% 84.2%
Flaky tests 0 0
Total Runs 57 57
Cost per result 1.359 4.412
Total Cost $0.245 $0.706
Input Price $1.500 / 1M $5.000 / 1M
Output Price $9.000 / 1M $30.000 / 1M
Output Tokens 2,003 2,008
Reasoning Tokens 20,245 16,914
Response Time (avg) 2.84s 8.80s
Response Time (max) 6.44s 56.19s
Response Time (total) 54.00s 167.26s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 2.52s 209 2,536
GPT-5.5 10.0 10.0 100.0% 0 4.43s 246 1,011
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 5.49s 428 3,146
GPT-5.5 10.0 10.0 100.0% 0 7.79s 369 936
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 6.44s 351 3,050
GPT-5.5 10.0 10.0 100.0% 0 9.56s 303 717
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 1.81s 279 1,164
GPT-5.5 10.0 10.0 100.0% 0 3.28s 228 157
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.5 Flash 7.7 10.0 66.7% 0 3.39s 12 4,538
GPT-5.5 5.3 10.0 33.3% 0 27.57s 69 11,731
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 2.27s 119 916
GPT-5.5 10.0 10.0 100.0% 0 7.14s 146 170
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.5 Flash 9.9 10.0 100.0% 0 1.86s 71 1,652
GPT-5.5 9.9 10.0 100.0% 0 2.98s 93 356
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 2.35s 288 2,150
GPT-5.5 10.0 10.0 100.0% 0 4.94s 274 895
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 3.27s 234 403
GPT-5.5 10.0 10.0 100.0% 0 4.96s 250 101
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 1.88s 12 690
GPT-5.5 3.0 10.0 0.0% 0 10.06s 30 840

Quick Compare

Switch Comparison Pair