Navigate
AI BENCHY
Compare Charts
❤️ Made by XCS
Your ad here

AI BENCHY Compare

OpenAI: GPT-5.4 vs Qwen: Qwen3.5-122B-A10B

Compare:

Last updated at: 2026-03-05

Metric OpenAI: GPT-5.4 medium Release: 2026-03-05 Qwen: Qwen3.5-122B-A10B medium Release: 2026-02-24
Avg Score 8.2 8.2
Tests Correct
Rank #7 #6
Consistency 8.9 9.4
Cost per result 6.533 3.962
Total Cost $0.784 $0.476
Attempt pass rate 86.7% 82.2%
Flaky tests 2 1
common.totalAttempts 45 (15 x 3) 45 (15 x 3)
Output Tokens 1,611 17,226
Reasoning Tokens 46,321 138,033
Response Time (avg) 21.06s 29.45s
Response Time (max) 100.41s 119.29s
Response Time (total) 315.95s 441.71s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Avg Score vs Response Time (avg)

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 10.0 10.0 100.0% 0 5.02s 216 1,466
Qwen: Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 6.99s 248 10,486
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 10.0 10.0 100.0% 0 20.57s 301 3,543
Qwen: Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 107.79s 483 11,337
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 9.9 10.0 100.0% 0 5.32s 234 804
Qwen: Qwen3.5-122B-A10B 9.9 10.0 100.0% 0 23.41s 270 16,558
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 4.0 7.2 44.4% 1 74.27s 61 34,748
Qwen: Qwen3.5-122B-A10B 10.0 7.2 11.1% 1 63.40s 15,537 64,889
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 10.0 10.0 100.0% 0 3.11s 93 897
Qwen: Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 9.88s 77 7,372
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 7.0 7.2 88.9% 1 9.13s 442 3,832
Qwen: Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 17.18s 289 26,165
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.4 10.0 10.0 100.0% 0 13.28s 264 1,031
Qwen: Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 4.60s 322 1,226

Quick Compare

Switch Comparison Pair