Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

OpenAI: GPT-5.3 Chat vs xAI: Grok Build 0.1

Last updated at: 2026-05-21

Metric GPT-5.3 Chat GPT-5.3 Chat none Release: 2026-03-03 Grok Build 0.1 Grok Build 0.1 medium Release: 2026-05-21
Score 7.6 7.8
Rank #52 #41
Reliability 10.0 10.0
Consistency 8.7 8.9
Tests Correct
Attempt pass rate 70.2% 71.9%
Flaky tests 3 3
Total Runs 57 57
Cost per result 2.895 4.064
Total Cost $0.348 $0.488
Input Price $1.750 / 1M $1.000 / 1M
Output Price $14.000 / 1M $2.000 / 1M
Output Tokens 21,353 1,947
Reasoning Tokens 0 223,372
Response Time (avg) 5.80s 22.28s
Response Time (max) 18.33s 88.28s
Response Time (total) 110.27s 423.30s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.3 Chat 6.7 8.1 58.3% 1 3.86s 3,167 0
Grok Build 0.1 10.0 10.0 100.0% 0 5.46s 195 9,825
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 9.32s 1,436 0
Grok Build 0.1 7.3 3.7 66.7% 1 30.98s 354 17,734
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 11.96s 2,614 0
Grok Build 0.1 10.0 10.0 100.0% 0 30.81s 231 18,779
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 2.21s 942 0
Grok Build 0.1 10.0 10.0 100.0% 0 7.76s 180 10,343
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.3 Chat 3.5 4.4 33.3% 2 13.01s 8,264 0
Grok Build 0.1 5.3 10.0 33.3% 0 77.75s 501 111,807
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.3 Chat 4.6 10.0 0.0% 0 1.99s 319 0
Grok Build 0.1 3.8 2.5 33.3% 1 10.14s 78 5,386
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.3 Chat 9.8 10.0 100.0% 0 3.29s 1,455 0
Grok Build 0.1 9.8 10.0 100.0% 0 9.62s 57 12,436
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 2.93s 1,726 0
Grok Build 0.1 6.2 7.5 55.6% 1 8.67s 161 15,476
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 8.36s 861 0
Grok Build 0.1 10.0 10.0 100.0% 0 9.40s 180 5,319
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
GPT-5.3 Chat 3.0 10.0 0.0% 0 4.38s 569 0
Grok Build 0.1 3.0 10.0 0.0% 0 26.07s 10 16,267

Quick Compare

Switch Comparison Pair