Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

OpenAI: GPT-5.2 vs xAI: Grok 4.20

Last updated at: 2026-05-10

Metric GPT-5.2 GPT-5.2 medium Release: 2025-12-11 Grok 4.20 Grok 4.20 medium Release: 2026-03-31
Score 7.2 6.9
Rank #60 #68
Reliability 10.0 10.0
Consistency 8.2 8.3
Tests Correct
Attempt pass rate 68.4% 63.2%
Flaky tests 4 4
Total Runs 57 57
Cost per result 3.609 7.559
Total Cost $0.397 $0.756
Input Price $1.750 / 1M $1.250 / 1M
Output Price $14.000 / 1M $2.500 / 1M
Output Tokens 2,731 1,784
Reasoning Tokens 22,200 128,233
Response Time (avg) 15.22s 14.53s
Response Time (max) 77.80s 63.48s
Response Time (total) 182.59s 276.06s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.2 6.5 8.0 58.3% 1 7.81s 567 2,002
Grok 4.20 8.2 7.9 83.3% 1 3.95s 287 8,312
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.2 10.0 10.0 100.0% 0 15.12s 467 2,166
Grok 4.20 4.3 1.1 66.7% 1 24.33s 250 12,804
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.2 10.0 10.0 100.0% 0 14.06s 291 1,757
Grok 4.20 10.0 10.0 100.0% 0 17.40s 232 9,556
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.2 10.0 10.0 100.0% 0 3.15s 234 420
Grok 4.20 10.0 10.0 100.0% 0 4.17s 180 5,333
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.2 5.9 7.2 55.6% 1 77.80s 42 10,342
Grok 4.20 5.3 10.0 33.3% 0 27.03s 375 49,339
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.2 3.7 9.7 0.0% 0 4.32s 162 269
Grok 4.20 3.9 2.6 33.3% 1 24.48s 65 6,440
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.2 9.9 10.0 100.0% 0 3.12s 94 614
Grok 4.20 7.3 6.0 83.3% 1 4.42s 40 5,474
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.2 7.6 7.3 77.8% 1 5.47s 609 938
Grok 4.20 7.7 10.0 66.7% 0 6.20s 149 7,913
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.2 4.7 1.6 66.7% 1 10.30s 239 469
Grok 4.20 3.0 10.0 0.0% 0 13.68s 197 6,620
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.2 3.0 10.0 0.0% 0 28.18s 26 3,223
Grok 4.20 3.0 10.0 0.0% 0 63.48s 9 16,442

Quick Compare

Switch Comparison Pair