Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Google: Gemini 2.5 Flash vs xAI: Grok 4.20

Last updated at: 2026-06-01

Metric Gemini 2.5 Flash Gemini 2.5 Flash none Release: 2025-06-17 Grok 4.20 Grok 4.20 medium Release: 2026-03-31
Score 6.4 7.0
Rank #95 #79
Reliability 10.0 10.0
Consistency 9.6 8.4
Tests Correct
Attempt pass rate 48.3% 63.3%
Flaky tests 1 4
Total Runs 60 60
Cost per result 0.159 7.616
Total Cost $0.015 $0.450
Input Price $0.300 / 1M $1.250 / 1M
Output Price $2.500 / 1M $2.500 / 1M
Output Tokens 1,764 1,816
Reasoning Tokens 0 157,251
Response Time (avg) 889ms 19.08s
Response Time (max) 4.39s 105.80s
Response Time (total) 17.79s 381.60s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 2.5 Flash 3.0 10.0 0.0% 0 582ms 102 0
Grok 4.20 8.2 7.9 83.3% 1 3.95s 287 8,312
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 2.5 Flash 6.8 10.0 50.0% 0 810ms 477 0
Grok 4.20 4.1 1.8 50.0% 2 65.07s 265 40,877
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 2.5 Flash 3.0 10.0 0.0% 0 4.39s 366 0
Grok 4.20 10.0 10.0 100.0% 0 17.40s 232 9,556
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 2.5 Flash 10.0 10.0 100.0% 0 652ms 279 0
Grok 4.20 10.0 10.0 100.0% 0 4.17s 180 5,333
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 2.5 Flash 5.9 7.2 55.6% 1 495ms 12 0
Grok 4.20 5.3 10.0 33.3% 0 27.03s 375 49,339
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 2.5 Flash 5.0 10.0 0.0% 0 615ms 78 0
Grok 4.20 3.9 2.6 33.3% 1 24.48s 65 6,440
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 2.5 Flash 10.0 10.0 100.0% 0 590ms 72 0
Grok 4.20 9.8 10.0 100.0% 0 4.26s 57 6,419
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 2.5 Flash 7.7 10.0 66.7% 0 604ms 132 0
Grok 4.20 7.7 10.0 66.7% 0 6.22s 149 7,913
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 2.5 Flash 10.0 10.0 100.0% 0 1.91s 234 0
Grok 4.20 3.0 10.0 0.0% 0 13.68s 197 6,620
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 2.5 Flash 3.0 10.0 0.0% 0 1.15s 12 0
Grok 4.20 3.0 10.0 0.0% 0 63.48s 9 16,442

Quick Compare

Switch Comparison Pair