Navigate
AI BENCHY
Compare Charts
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

OpenAI: GPT-5.3 Chat vs Qwen: Qwen3.5-122B-A10B

Compare:

Last updated at: 2026-03-03

Metric OpenAI: GPT-5.3 Chat none Release: 2026-03-03 Qwen: Qwen3.5-122B-A10B medium Release: 2026-02-24
Rank #14 #21
Avg Score 7.27 6.77
Consistency 8.26 8.22
Cost per result 2.835 5.137
Total Cost $0.256 $0.463
Tests Correct
Attempt pass rate 73.8% 76.2%
Flaky tests 3 3
Output Tokens 16,339 16,751
Reasoning Tokens 0 125,394

Top Models by Score

Score vs Total Cost

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
OpenAI: GPT-5.3 Chat 7.33 7.49 77.8% 1 3,091 0
Qwen: Qwen3.5-122B-A10B 10.00 10.00 100.0% 0 248 10,486
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
OpenAI: GPT-5.3 Chat 9.88 10.00 100.0% 0 942 0
Qwen: Qwen3.5-122B-A10B 9.88 10.00 100.0% 0 270 16,558
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
OpenAI: GPT-5.3 Chat 1.00 4.41 33.3% 2 8,264 0
Qwen: Qwen3.5-122B-A10B 1.00 7.21 11.1% 1 15,537 64,889
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
OpenAI: GPT-5.3 Chat 8.50 9.99 50.0% 0 1,455 0
Qwen: Qwen3.5-122B-A10B 5.50 5.92 83.3% 1 77 7,372
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
OpenAI: GPT-5.3 Chat 10.00 10.00 100.0% 0 1,726 0
Qwen: Qwen3.5-122B-A10B 7.00 7.21 88.9% 1 297 24,863
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Output Tokens Reasoning Tokens
OpenAI: GPT-5.3 Chat 10.00 10.00 100.0% 0 861 0
Qwen: Qwen3.5-122B-A10B 10.00 10.00 100.0% 0 322 1,226

Quick Compare

Switch Comparison Pair