Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

OpenAI: GPT-5.3 Chat vs Qwen: Qwen3.5-35B-A3B

Summary

GPT-5.3 Chat vs Qwen3.5-35B-A3B benchmark comparison: GPT-5.3 Chat leads on average score with 7.5 vs 6.3. Qwen3.5-35B-A3B has the lower benchmark cost at $0.401 vs $0.433. GPT-5.3 Chat is faster at 6.34s vs 72.57s, with pass rates of 66.7% vs 69.8%.

Recommended model: GPT-5.3 Chat - It has the best score here (7.5), while responding about 11.4x faster than Qwen3.5-35B-A3B.

Last updated at: 2026-06-18

Metric GPT-5.3 Chat GPT-5.3 Chat none Release: 2026-03-03 Qwen3.5-35B-A3B Qwen3.5-35B-A3B medium Release: 2026-02-24
Score 7.5 6.3
Rank #45 #89
Reliability 10.0 10.0
Consistency 8.1 7.5
Tests Correct
Attempt pass rate 66.7% 69.8%
Flaky tests 5 6
Total Runs 63 63
Cost per result 3.605 5.162
Total Cost $0.433 $0.401
Input Price $1.750 / 1M $0.140 / 1M
Output Price $14.000 / 1M $1.000 / 1M
Total Input Tokens 34,209 42,196
Output Tokens 26,617 40,630
Reasoning Tokens 0 353,577
Response Time (avg) 6.34s 72.57s
Response Time (max) 18.33s 409.98s
Response Time (total) 133.13s 1524.04s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#45 GPT-5.3 Chat

none
Cost
$0.008
Time
8.1s
Tokens
634 tok

#89 Qwen3.5-35B-A3B

medium
Cost
$0.009
Time
71.4s
Tokens
8,631 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 6.7 8.1 58.3% 1 3.86s 606 3,167 0
Qwen3.5-35B-A3B 10.0 10.0 100.0% 0 21.13s 672 798 42,652
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 5.6 4.7 55.6% 2 10.52s 7,302 6,632 0
Qwen3.5-35B-A3B 5.9 9.3 33.3% 0 206.65s 4,106 23,844 111,462
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 11.96s 11,019 2,614 0
Qwen3.5-35B-A3B 4.7 1.6 66.7% 1 75.34s 20,992 775 12,485
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 2.21s 7,140 942 0
Qwen3.5-35B-A3B 7.3 5.9 83.3% 1 59.33s 6,061 235 19,493
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 3.5 4.4 33.3% 2 13.01s 723 8,264 0
Qwen3.5-35B-A3B 4.1 4.4 44.5% 2 88.34s 500 41 46,368
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 4.6 10.0 0.0% 0 1.99s 477 319 0
Qwen3.5-35B-A3B 2.8 1.6 33.3% 1 30.30s 172 20 3,753
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 9.8 10.0 100.0% 0 3.51s 660 1,491 0
Qwen3.5-35B-A3B 10.0 10.0 100.0% 0 24.45s 699 97 17,361
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 2.99s 642 1,758 0
Qwen3.5-35B-A3B 8.2 7.2 88.9% 1 33.13s 597 3,592 26,585
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 10.0 10.0 100.0% 0 8.36s 5,445 861 0
Qwen3.5-35B-A3B 10.0 10.0 100.0% 0 4.65s 8,193 309 1,365
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
GPT-5.3 Chat 3.0 10.0 0.0% 0 4.38s 195 569 0
Qwen3.5-35B-A3B 3.0 10.0 0.0% 0 177.35s 204 10,919 72,053

Quick Compare

Switch Comparison Pair