Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Google: Gemini 3.5 Flash vs Inception: Mercury 2

Last updated at: 2026-06-03

Metric Gemini 3.5 Flash Gemini 3.5 Flash high Release: 2026-05-19 Mercury 2 Mercury 2 none Release: 2026-02-24
Score 9.6 4.6
Rank #2 #153
Reliability 10.0 10.0
Consistency 9.6 9.1
Tests Correct
Attempt pass rate 96.7% 25.0%
Flaky tests 1 2
Total Runs 60 60
Cost per result 5.231 0.216
Total Cost $0.994 $0.009
Input Price $1.500 / 1M $0.250 / 1M
Output Price $9.000 / 1M $0.750 / 1M
Total Input Tokens 34,591 25,515
Output Tokens 1,969 3,001
Reasoning Tokens 102,679 0
Response Time (avg) 8.30s 614ms
Response Time (max) 34.82s 1.27s
Response Time (total) 165.92s 12.28s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 2.57s 492 174 4,997
Mercury 2 3.0 10.0 0.0% 0 483ms 631 286 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 24.62s 5,115 450 34,170
Mercury 2 3.5 9.4 0.0% 0 831ms 4,631 1,650 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 22.37s 12,873 351 16,323
Mercury 2 3.0 10.0 0.0% 0 606ms 4,821 131 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 6.43s 7,548 279 8,466
Mercury 2 7.3 5.9 83.3% 1 667ms 6,362 180 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 7.6 7.2 77.8% 1 14.09s 633 12 24,721
Mercury 2 5.3 7.2 44.4% 1 534ms 784 46 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 3.63s 486 115 1,650
Mercury 2 4.8 10.0 0.0% 0 628ms 495 159 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 3.35s 615 70 3,799
Mercury 2 6.5 10.0 50.0% 0 551ms 691 82 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 3.23s 558 241 4,940
Mercury 2 3.1 10.0 0.0% 0 535ms 694 251 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 9.8 10.0 100.0% 0 4.96s 6,115 265 1,608
Mercury 2 10.0 10.0 100.0% 0 1.27s 6,193 197 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Gemini 3.5 Flash 10.0 10.0 100.0% 0 3.94s 156 12 2,005
Mercury 2 3.0 10.0 0.0% 0 548ms 213 19 0

Quick Compare

Switch Comparison Pair