AI BENCHY
Compare
❤️ Made by XCS
Your ad here

Model Name

OpenAI: GPT-5.3-Codex

Reasoning (medium)

Last updated at : Feb 24, 2026

Metric OpenAI: GPT-5.3-Codex
Rank#6
CompanyOpenAI
Score 7.77
Consistency 8.75
Cost per result 4.9342
Total Cost $0.44408
Tests Correct
Attempt pass rate 76.9%
Flaky tests 2
Output Tokens 947
Reasoning Tokens 29,564
Response Time (avg)17944ms
Response Time (total)233267ms
Response Time (max)100927ms

Category Breakdown

Category Fully passed tests Score Consistency Attempt pass rate Flaky tests Reasoning score Response Time (avg) Cost
Anti-AI Tricks 10.00 10.00 100.0% 0 6.00 4687ms $0.02371
Data parsing and extraction 10.00 10.00 100.0% 0 1.25 3180ms $0.02600
Domain specific 4.00 7.21 55.6% 1 1.00 64314ms $0.35664
Instructions following 9.00 10.00 50.0% 0 1.00 3037ms $0.01216
Puzzle Solving 7.00 7.38 77.8% 1 6.00 4610ms $0.02559

Compared models

Compare OpenAI: GPT-5.3-Codex against...

#5 · Google

Google: Gemini 3 Flash Preview

Reasoning (low)

Score: 8.23

Consistency: 8.71

Attempt pass rate: 82.0%

Flaky tests: 2

Cost per result: 0.6173

Tests Correct:

Total Cost: $0.06174

Compare

#7 · OpenAI

OpenAI: GPT-5.2

Reasoning (medium)

Score: 7.38

Consistency: 8.73

Attempt pass rate: 76.9%

Flaky tests: 2

Cost per result: 2.5637

Tests Correct:

Total Cost: $0.23074

Compare

#4 · Qwen

Qwen: Qwen3.5 Plus 2026-02-15

Reasoning (medium)

Score: 8.54

Consistency: 9.35

Attempt pass rate: 87.2%

Flaky tests: 1

Cost per result: 2.1621

Tests Correct:

Total Cost: $0.23784

Compare

Quick Compare

Compare OpenAI: GPT-5.3-Codex against...