Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

OpenAI: GPT-5.2 Chat vs Qwen: Qwen3.5-122B-A10B

Last updated at: 2026-06-02

Metric GPT-5.2 Chat GPT-5.2 Chat none Release: 2025-12-11 Qwen3.5-122B-A10B Qwen3.5-122B-A10B medium Release: 2026-02-24
Score 7.9 7.7
Rank #32 #41
Reliability 10.0 10.0
Consistency 8.9 8.8
Tests Correct
Attempt pass rate 73.3% 71.7%
Flaky tests 3 3
Total Runs 60 60
Cost per result 2.703 5.031
Total Cost $0.352 $0.509
Input Price $1.750 / 1M $0.260 / 1M
Output Price $14.000 / 1M $2.080 / 1M
Total Input Tokens 31,593 38,997
Output Tokens 21,144 26,166
Reasoning Tokens 0 213,524
Response Time (avg) 6.82s 39.40s
Response Time (max) 38.52s 168.16s
Response Time (total) 136.34s 788.00s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 Chat 8.7 7.9 91.7% 1 3.40s 606 1,807 0
Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 9.75s 672 269 16,835
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 Chat 8.2 6.7 83.3% 1 8.05s 4,686 4,131 0
Qwen3.5-122B-A10B 4.1 5.8 33.3% 1 119.57s 4,795 8,036 45,074
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 Chat 10.0 10.0 100.0% 0 9.12s 11,019 1,243 0
Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 107.79s 14,947 483 11,337
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 Chat 10.0 10.0 100.0% 0 3.05s 7,140 980 0
Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 23.41s 7,782 270 16,558
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 Chat 5.3 10.0 33.3% 0 17.78s 723 7,810 0
Qwen3.5-122B-A10B 2.9 7.2 11.1% 1 63.40s 771 15,537 64,889
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 Chat 4.4 3.0 33.3% 1 3.20s 477 335 0
Qwen3.5-122B-A10B 3.4 2.2 33.3% 1 34.11s 344 66 7,592
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 Chat 9.8 10.0 100.0% 0 5.51s 660 1,441 0
Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 9.88s 593 77 7,372
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 Chat 7.7 10.0 66.7% 0 4.10s 642 1,603 0
Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 17.89s 696 284 27,575
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 Chat 10.0 10.0 100.0% 0 4.68s 5,445 555 0
Qwen3.5-122B-A10B 10.0 10.0 100.0% 0 4.60s 8,193 322 1,226
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.2 Chat 3.0 10.0 0.0% 0 6.89s 195 1,239 0
Qwen3.5-122B-A10B 3.0 10.0 0.0% 0 52.87s 204 822 15,066

Quick Compare

Switch Comparison Pair