Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Google: Gemini 3.1 Flash Lite vs xAI: Grok Build 0.1

Last updated at: 2026-05-21

Metric Gemini 3.1 Flash Lite Gemini 3.1 Flash Lite low Release: 2026-05-08 Grok Build 0.1 Grok Build 0.1 medium Release: 2026-05-21
Score 7.6 7.8
Rank #50 #41
Reliability 10.0 10.0
Consistency 9.2 8.9
Tests Correct
Attempt pass rate 68.4% 71.9%
Flaky tests 2 3
Total Runs 57 57
Cost per result 0.203 4.064
Total Cost $0.025 $0.488
Input Price $0.250 / 1M $1.000 / 1M
Output Price $1.500 / 1M $2.000 / 1M
Output Tokens 2,702 1,947
Reasoning Tokens 8,596 223,372
Response Time (avg) 1.92s 22.28s
Response Time (max) 5.66s 88.28s
Response Time (total) 36.49s 423.30s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 7.3 6.2 75.0% 2 1.84s 1,013 1,548
Grok Build 0.1 10.0 10.0 100.0% 0 5.46s 195 9,825
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 1.46s 441 408
Grok Build 0.1 7.3 3.7 66.7% 1 30.98s 354 17,734
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 3.0 10.0 0.0% 0 4.48s 348 975
Grok Build 0.1 10.0 10.0 100.0% 0 30.81s 231 18,779
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 1.44s 291 697
Grok Build 0.1 10.0 10.0 100.0% 0 7.76s 180 10,343
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 5.3 10.0 33.3% 0 1.52s 15 1,214
Grok Build 0.1 5.3 10.0 33.3% 0 77.75s 501 111,807
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 4.0 10.0 0.0% 0 1.37s 69 438
Grok Build 0.1 3.8 2.5 33.3% 1 10.14s 78 5,386
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 1.52s 72 760
Grok Build 0.1 9.8 10.0 100.0% 0 9.62s 57 12,436
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 1.40s 210 1,191
Grok Build 0.1 6.2 7.5 55.6% 1 8.67s 161 15,476
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 5.66s 234 945
Grok Build 0.1 10.0 10.0 100.0% 0 9.40s 180 5,319
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 3.0 10.0 0.0% 0 1.46s 9 420
Grok Build 0.1 3.0 10.0 0.0% 0 26.07s 10 16,267

Quick Compare

Switch Comparison Pair