Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

OpenAI: GPT-5.5 vs Qwen: Qwen3.6 Max Preview

Summary

GPT-5.5 vs Qwen3.6 Max Preview benchmark comparison: GPT-5.5 leads on average score with 8.8 vs 8.5. Qwen3.6 Max Preview has the lower benchmark cost at $0.960 vs $3.679. GPT-5.5 is faster at 37.98s vs 59.63s, with pass rates of 87.3% vs 81.0%.

Recommended model: Qwen3.6 Max Preview - Its score stays close to the best score here (8.5 vs 8.8), while costing about 3.8x less than GPT-5.5.

Last updated at: 2026-06-10

Metric GPT-5.5 GPT-5.5 medium Release: 2026-04-24 Qwen3.6 Max Preview Qwen3.6 Max Preview medium Release: 2026-04-20
Score 8.8 8.5
Rank #9 #15
Reliability 10.0 10.0
Consistency 8.9 9.3
Tests Correct
Attempt pass rate 87.3% 81.0%
Flaky tests 3 2
Total Runs 63 63
Cost per result 21.638 7.024
Total Cost $3.679 $0.960
Input Price $5.000 / 1M $1.040 / 1M
Output Price $30.000 / 1M $6.240 / 1M
Total Input Tokens 34,212 42,362
Output Tokens 1,985 2,273
Reasoning Tokens 114,925 144,367
Response Time (avg) 37.98s 59.63s
Response Time (max) 332.10s 238.07s
Response Time (total) 797.60s 1252.17s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#9 GPT-5.5

medium
Cost
$0.112
Time
71.9s
Tokens
3,807 tok

#15 Qwen3.6 Max Preview

medium
Cost
$0.024
Time
76.5s
Tokens
3,861 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.5 10.0 10.0 100.0% 0 4.66s 606 250 1,335
Qwen3.6 Max Preview 10.0 10.0 100.0% 0 22.13s 672 228 10,075
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.5 8.8 7.8 88.9% 1 59.77s 7,305 362 24,959
Qwen3.6 Max Preview 8.8 7.8 88.9% 1 146.48s 7,895 427 52,957
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.5 10.0 10.0 100.0% 0 19.29s 11,019 312 2,841
Qwen3.6 Max Preview 10.0 10.0 100.0% 0 121.49s 14,934 390 14,575
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.5 10.0 10.0 100.0% 0 4.18s 7,140 234 593
Qwen3.6 Max Preview 10.0 10.0 100.0% 0 41.15s 7,782 270 10,106
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.5 5.3 7.2 44.4% 1 164.14s 723 67 79,625
Qwen3.6 Max Preview 2.9 7.2 11.1% 1 95.91s 771 60 30,371
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.5 10.0 10.0 100.0% 0 4.16s 477 138 223
Qwen3.6 Max Preview 10.0 10.0 100.0% 0 32.24s 516 129 3,510
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.5 10.0 10.0 100.0% 0 3.36s 660 93 538
Qwen3.6 Max Preview 10.0 10.0 100.0% 0 24.31s 699 103 5,848
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.5 10.0 10.0 100.0% 0 6.76s 642 241 2,225
Qwen3.6 Max Preview 10.0 10.0 100.0% 0 24.32s 696 329 7,693
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.5 10.0 10.0 100.0% 0 10.57s 5,445 258 832
Qwen3.6 Max Preview 10.0 10.0 100.0% 0 18.32s 8,193 309 1,571
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.5 2.8 1.6 33.3% 1 37.86s 195 30 1,754
Qwen3.6 Max Preview 3.0 10.0 0.0% 0 60.56s 204 28 7,661

Quick Compare

Switch Comparison Pair