Navigate
AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
Your ad here

AI BENCHY Compare

OpenAI: GPT-5.2 Chat vs Qwen: Qwen3.5-Flash

Compare:

Last updated at: 2026-03-06

Metric OpenAI: GPT-5.2 Chat none Release: 2025-12-11 Qwen: Qwen3.5-Flash medium Release: 2026-02-24
Avg Score 7.8 7.2
Rank #11 #22
Tests Correct
Consistency 9.5 7.9
Cost per result 2.203 0.552
Total Cost $0.265 $0.061
Attempt pass rate 79.2% 83.3%
Flaky tests 1 4
common.totalRuns 46 (16 x 2.88) 46 (16 x 2.88)
Output Tokens 15,600 1,736
Reasoning Tokens 0 141,900
Response Time (avg) 7.04s 70.90s
Response Time (max) 38.52s 234.29s
Response Time (total) 112.65s 1134.43s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Avg Score vs Response Time (avg)

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.2 Chat 10.0 10.0 100.0% 0 3.97s 1,651 0
Qwen: Qwen3.5-Flash 10.0 10.0 100.0% 0 71.35s 363 23,645
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.2 Chat 10.0 10.0 100.0% 0 9.12s 1,243 0
Qwen: Qwen3.5-Flash 10.0 10.0 100.0% 0 17.78s 483 8,270
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.2 Chat 9.9 10.0 100.0% 0 3.05s 980 0
Qwen: Qwen3.5-Flash 5.5 5.9 83.3% 1 56.99s 235 16,237
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.2 Chat 4.0 10.0 33.3% 0 17.78s 7,810 0
Qwen: Qwen3.5-Flash 4.0 7.2 44.4% 1 146.50s 58 43,615
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.2 Chat 10.0 10.0 100.0% 0 3.34s 90 0
Qwen: Qwen3.5-Flash 10.0 10.0 100.0% 0 41.59s 28 10,434
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.2 Chat 6.0 6.1 83.3% 1 5.46s 1,528 0
Qwen: Qwen3.5-Flash 10.0 10.0 100.0% 0 63.49s 98 14,139
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.2 Chat 7.0 10.0 66.7% 0 4.42s 1,743 0
Qwen: Qwen3.5-Flash 4.0 4.4 77.8% 2 56.74s 162 24,276
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
OpenAI: GPT-5.2 Chat 10.0 10.0 100.0% 0 4.68s 555 0
Qwen: Qwen3.5-Flash 10.0 10.0 100.0% 0 10.33s 309 1,284

Quick Compare

Switch Comparison Pair