Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

OpenAI: GPT-5.3 Chat vs xAI: Grok 4.20

Last updated at: 2026-06-04

Metric GPT-5.3 Chat GPT-5.3 Chat none Release: 2026-03-03 Grok 4.20 Grok 4.20 medium Release: 2026-03-31
Score 7.2 7.1
Rank #63 #65
Reliability 10.0 10.0
Consistency 8.1 8.8
Tests Correct
Attempt pass rate 66.7% 63.5%
Flaky tests 5 3
Total Runs 63 63
Cost per result 3.605 8.309
Total Cost $0.433 $0.609
Input Price $1.750 / 1M $1.250 / 1M
Output Price $14.000 / 1M $2.500 / 1M
Total Input Tokens 34,209 44,433
Output Tokens 26,617 1,819
Reasoning Tokens 0 219,524
Response Time (avg) 6.34s 27.68s
Response Time (max) 18.33s 199.66s
Response Time (total) 133.13s 581.26s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 6.7 8.1 58.3% 1 3.86s 606 3,167 0
Grok 4.20 8.2 7.9 83.3% 1 3.95s 2,010 287 8,312
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 5.6 4.7 55.6% 2 10.52s 7,302 6,632 0
Grok 4.20 6.3 6.6 55.6% 1 109.93s 8,307 268 103,150
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 11.96s 11,019 2,614 0
Grok 4.20 10.0 10.0 100.0% 0 17.40s 12,909 232 9,556
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 2.21s 7,140 942 0
Grok 4.20 10.0 10.0 100.0% 0 4.17s 7,761 180 5,333
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 3.5 4.4 33.3% 2 13.01s 723 8,264 0
Grok 4.20 5.3 10.0 33.3% 0 27.03s 1,764 375 49,339
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 4.6 10.0 0.0% 0 1.99s 477 319 0
Grok 4.20 3.9 2.6 33.3% 1 24.48s 825 65 6,440
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 9.8 10.0 100.0% 0 3.51s 660 1,491 0
Grok 4.20 9.8 10.0 100.0% 0 4.26s 1,362 57 6,419
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 2.99s 642 1,758 0
Grok 4.20 7.7 10.0 66.7% 0 6.22s 1,689 149 7,913
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 8.36s 5,445 861 0
Grok 4.20 3.0 10.0 0.0% 0 13.68s 7,275 197 6,620
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 3.0 10.0 0.0% 0 4.38s 195 569 0
Grok 4.20 3.0 10.0 0.0% 0 63.48s 531 9 16,442

Quick Compare

Switch Comparison Pair