Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

OpenAI: GPT-5.2 vs Qwen: Qwen3.5-Flash

Summary

GPT-5.2 vs Qwen3.5-Flash benchmark comparison: GPT-5.2 leads on average score with 8.4 vs 6.8. Qwen3.5-Flash has the lower benchmark cost at $0.080 vs $0.548. GPT-5.2 is faster at 16.88s vs 63.29s, with pass rates of 71.4% vs 71.4%.

Recommended model: GPT-5.2 - It has the best score here (8.4), while responding about 3.7x faster than Qwen3.5-Flash.

Last updated at: 2026-06-18

Metric GPT-5.2 GPT-5.2 medium Release: 2025-12-11 Qwen3.5-Flash Qwen3.5-Flash medium Release: 2026-02-24
Score 8.4 6.8
Rank #22 #70
Reliability 10.0 10.0
Consistency 8.4 8.1
Tests Correct
Attempt pass rate 71.4% 71.4%
Flaky tests 4 5
Total Runs 63 63
Cost per result 4.209 0.871
Total Cost $0.548 $0.080
Input Price $1.750 / 1M $0.065 / 1M
Output Price $14.000 / 1M $0.260 / 1M
Total Input Tokens 33,967 38,926
Output Tokens 2,901 2,088
Reasoning Tokens 31,932 294,598
Response Time (avg) 16.88s 63.29s
Response Time (max) 77.80s 234.29s
Response Time (total) 236.34s 1265.85s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#22 GPT-5.2

medium
Cost
$0.047
Time
49.2s
Tokens
3,396 tok

#70 Qwen3.5-Flash

medium
Cost
$0.002
Time
25.8s
Tokens
4,294 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 6.5 8.0 58.3% 1 7.81s 606 567 2,002
Qwen3.5-Flash 10.0 10.0 100.0% 0 59.11s 672 383 32,992
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 10.0 10.0 100.0% 0 22.73s 7,302 511 11,912
Qwen3.5-Flash 3.7 7.2 22.2% 1 58.87s 6,685 302 90,081
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 10.0 10.0 100.0% 0 14.06s 11,019 291 1,757
Qwen3.5-Flash 10.0 10.0 100.0% 0 17.78s 14,934 483 8,270
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 10.0 10.0 100.0% 0 3.15s 7,140 234 420
Qwen3.5-Flash 7.3 5.9 83.3% 1 56.99s 6,061 235 16,237
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 5.9 7.2 55.6% 1 77.80s 473 42 10,342
Qwen3.5-Flash 5.3 7.2 44.4% 1 146.50s 581 58 43,615
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 3.7 9.7 0.0% 0 4.32s 477 162 269
Qwen3.5-Flash 6.1 3.1 66.7% 1 40.05s 516 99 38,486
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 9.9 10.0 100.0% 0 3.12s 660 94 614
Qwen3.5-Flash 10.0 10.0 100.0% 0 63.49s 699 98 14,139
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 7.5 7.3 77.8% 1 5.80s 642 735 924
Qwen3.5-Flash 8.2 7.2 88.9% 1 27.61s 381 89 12,457
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 4.7 1.6 66.7% 1 10.30s 5,453 239 469
Qwen3.5-Flash 10.0 10.0 100.0% 0 10.33s 8,193 309 1,284
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 3.0 10.0 0.0% 0 28.18s 195 26 3,223
Qwen3.5-Flash 3.0 10.0 0.0% 0 48.98s 204 32 37,037

Quick Compare

Switch Comparison Pair