Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Google: Gemini 3.1 Flash Lite vs OpenAI: GPT-5.5

Last updated at: 2026-05-08

Metric Gemini 3.1 Flash Lite Gemini 3.1 Flash Lite medium Release: 2026-05-08 GPT-5.5 GPT-5.5 low Release: 2026-04-24
Score 7.9 8.9
Rank #27 #6
Reliability 10.0 10.0
Consistency 9.1 10.0
Tests Correct
Attempt pass rate 71.9% 84.2%
Flaky tests 2 0
Total Runs 57 57
Cost per result 0.452 4.412
Total Cost $0.059 $0.706
Input Price $0.250 / 1M $5.000 / 1M
Output Price $1.500 / 1M $30.000 / 1M
Output Tokens 2,224 2,008
Reasoning Tokens 32,034 16,914
Response Time (avg) 3.14s 8.80s
Response Time (max) 10.87s 56.19s
Response Time (total) 59.62s 167.26s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 9.1 10.0 75.0% 0 2.39s 604 4,201
GPT-5.5 10.0 10.0 100.0% 0 4.43s 246 1,011
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 3.26s 429 2,712
GPT-5.5 10.0 10.0 100.0% 0 7.79s 369 936
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 10.87s 327 7,401
GPT-5.5 10.0 10.0 100.0% 0 9.56s 303 717
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 2.60s 279 2,845
GPT-5.5 10.0 10.0 100.0% 0 3.28s 228 157
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 2.9 7.2 11.1% 1 3.16s 15 5,165
GPT-5.5 5.3 10.0 33.3% 0 27.57s 69 11,731
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 2.60s 84 1,142
GPT-5.5 10.0 10.0 100.0% 0 7.14s 146 170
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 9.9 10.0 100.0% 0 2.59s 75 3,320
GPT-5.5 9.9 10.0 100.0% 0 2.98s 93 356
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 7.6 7.2 77.8% 1 1.95s 165 2,450
GPT-5.5 10.0 10.0 100.0% 0 4.94s 274 895
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 4.55s 234 921
GPT-5.5 10.0 10.0 100.0% 0 4.96s 250 101
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 3.0 10.0 0.0% 0 3.08s 12 1,877
GPT-5.5 3.0 10.0 0.0% 0 10.06s 30 840

Quick Compare

Switch Comparison Pair