Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Anthropic: Claude Sonnet 5 vs Qwen: Qwen3.5-Flash

Summary

Claude Sonnet 5 vs Qwen3.5-Flash benchmark comparison: Qwen3.5-Flash leads on average score with 6.1 vs 5.7. Qwen3.5-Flash has the lower benchmark cost at $0.005 vs $0.287. Qwen3.5-Flash is faster at 3.58s vs 4.74s, with pass rates of 42.9% vs 39.7%.

Recommended model: Qwen3.5-Flash - It has the best score here (6.1), while costing about 69.4x less than Claude Sonnet 5.

Last updated at: 2026-06-30

Metric Claude Sonnet 5 Claude Sonnet 5 none Release: 2026-06-30 Qwen3.5-Flash Qwen3.5-Flash none Release: 2026-02-24
Score 5.7 6.1
Rank #117 #99
Reliability 10.0 10.0
Consistency 8.6 9.7
Tests Correct
Attempt pass rate 42.9% 39.7%
Flaky tests 4 1
Total Runs 63 63
Cost per result 4.098 0.075
Total Cost $0.287 $0.005
Input Price $2.000 / 1M $0.065 / 1M
Output Price $10.000 / 1M $0.260 / 1M
Total Input Tokens 76,797 46,439
Output Tokens 13,325 4,276
Reasoning Tokens 0 0
Response Time (avg) 4.74s 3.58s
Response Time (max) 29.46s 27.18s
Response Time (total) 99.46s 75.28s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#117 Claude Sonnet 5

none
Cost
$0.061
Time
53.7s
Tokens
6,172 tok

#99 Qwen3.5-Flash

none
Cost
$0.003
Time
47.4s
Tokens
7,799 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 5.3 10.0 25.0% 0 3.60s 834 1,813 0
Qwen3.5-Flash 3.5 8.3 8.3% 1 1.32s 696 690 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 4.6 7.9 22.2% 1 3.67s 10,590 1,864 0
Qwen3.5-Flash 5.5 10.0 33.3% 0 850ms 7,913 519 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 3.0 10.0 0.0% 0 29.46s 38,775 6,340 0
Qwen3.5-Flash 3.0 10.0 0.0% 0 6.22s 18,879 1,794 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 10.0 10.0 100.0% 0 3.01s 10,503 309 0
Qwen3.5-Flash 10.0 10.0 100.0% 0 1.57s 7,794 243 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 5.3 7.2 44.4% 1 3.28s 975 933 0
Qwen3.5-Flash 7.7 10.0 66.7% 0 905ms 789 15 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 4.7 3.1 33.3% 1 2.81s 708 272 0
Qwen3.5-Flash 10.0 10.0 100.0% 0 803ms 522 100 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 6.4 10.0 50.0% 0 2.58s 909 103 0
Qwen3.5-Flash 6.3 10.0 50.0% 0 8.81s 711 63 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 6.0 7.4 55.6% 1 3.22s 894 778 0
Qwen3.5-Flash 3.1 10.0 0.0% 0 10.89s 714 579 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 10.0 10.0 100.0% 0 6.80s 12,351 522 0
Qwen3.5-Flash 10.0 10.0 100.0% 0 3.67s 8,211 264 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Sonnet 5 3.0 10.0 0.0% 0 4.31s 258 391 0
Qwen3.5-Flash 3.0 10.0 0.0% 0 588ms 210 9 0

Quick Compare

Switch Comparison Pair