Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

Compared models

Summary

Qwen3.5 Plus 2026-02-15 vs Qwen3.6 Plus Preview vs GLM 5 Turbo benchmark comparisonGLM 5 Turbo leads on Score with 8.4. Qwen3.5 Plus 2026-02-15 leads on Reliability with 10.0. Qwen3.6 Plus Preview has the lowest Total Cost at $0.000. Qwen3.6 Plus Preview is fastest at 15.25s.

Recommended model: GLM 5 Turbo - It has the best score here (8.4), while responding about 1.9x faster than the other models in this comparison.

Last updated at: 2026-06-18

Metric Qwen3.5 Plus 2026-02-15 Qwen3.5 Plus 2026-02-15 medium Release: 2026-02-15 Qwen3.6 Plus Preview Qwen3.6 Plus Preview medium Release: 2026-04-20 Free Available GLM 5 Turbo GLM 5 Turbo medium Release: 2026-03-15
Score 8.0 5.8 8.4
Rank #28 #113 #21
Reliability 10.0 N/A 10.0
Consistency 8.8 9.0 8.5
Tests Correct
Attempt pass rate 73.0% 42.9% 74.6%
Flaky tests 3 0 4
Total Runs 63 57 63
Cost per result 2.445 0.000 2.011
Total Cost $0.310 $0.000 $0.323
Input Price $0.260 / 1M $0.000 / 1M $1.200 / 1M
Output Price $1.560 / 1M $0.000 / 1M $4.000 / 1M
Total Input Tokens 40,918 32,639 35,593
Output Tokens 2,159 1,153 12,245
Reasoning Tokens 189,604 62,197 62,277
Response Time (avg) 73.79s 15.25s 23.00s
Response Time (max) 266.69s 43.55s 194.23s
Response Time (total) 1033.07s 182.96s 482.97s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#28 Qwen3.5 Plus 2026-02-15

medium
Cost
$0.011
Time
125.5s
Tokens
7,040 tok

#113 Qwen3.6 Plus Preview

medium
No showcase result has been generated for this model yet.
Cost
$0.000
Time
-
Tokens
0 tok

#21 GLM 5 Turbo

medium
Cost
$0.074
Time
206.0s
Tokens
18,549 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5 Plus 2026-02-15 8.2 7.9 83.3% 1 45.78s 672 205 21,236
Qwen3.6 Plus Preview 8.3 10.0 75.0% 0 11.69s 501 61 5,812
GLM 5 Turbo 10.0 10.0 100.0% 0 4.82s 555 362 3,137
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5 Plus 2026-02-15 6.6 7.1 44.4% 1 180.70s 6,950 420 80,595
Qwen3.6 Plus Preview 9.8 3.3 0.0% 0 0ms 0 0 0
GLM 5 Turbo 8.2 9.3 66.7% 0 45.90s 5,941 363 25,381
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5 Plus 2026-02-15 10.0 10.0 100.0% 0 46.85s 14,934 421 7,906
Qwen3.6 Plus Preview 10.0 10.0 100.0% 0 34.95s 14,934 452 13,073
GLM 5 Turbo 10.0 10.0 100.0% 0 13.88s 12,714 390 2,037
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5 Plus 2026-02-15 10.0 10.0 100.0% 0 46.91s 7,782 270 14,916
Qwen3.6 Plus Preview 10.0 10.0 100.0% 0 14.95s 7,782 270 10,706
GLM 5 Turbo 10.0 10.0 100.0% 0 6.19s 7,107 577 3,632
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5 Plus 2026-02-15 5.3 10.0 33.3% 0 17.50s 444 35 16,680
Qwen3.6 Plus Preview 3.0 10.0 0.0% 0 22.08s 665 49 26,895
GLM 5 Turbo 2.9 4.4 22.2% 2 71.07s 489 9,665 19,279
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5 Plus 2026-02-15 4.7 1.6 66.7% 1 79.86s 344 73 8,675
Qwen3.6 Plus Preview 3.0 10.0 0.0% 0 0ms 0 0 0
GLM 5 Turbo 6.1 3.1 66.7% 1 10.05s 477 60 2,216
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5 Plus 2026-02-15 10.0 10.0 100.0% 0 31.93s 699 101 7,704
Qwen3.6 Plus Preview 6.5 10.0 50.0% 0 3.40s 381 27 1,383
GLM 5 Turbo 10.0 10.0 100.0% 0 5.38s 636 255 2,183
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5 Plus 2026-02-15 10.0 10.0 100.0% 0 32.50s 696 301 13,853
Qwen3.6 Plus Preview 5.3 10.0 33.3% 0 7.52s 183 27 2,998
GLM 5 Turbo 8.7 7.9 77.8% 1 5.23s 609 312 2,647
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5 Plus 2026-02-15 10.0 10.0 100.0% 0 7.54s 8,193 309 909
Qwen3.6 Plus Preview 10.0 10.0 100.0% 0 5.87s 8,193 267 1,330
GLM 5 Turbo 10.0 10.0 100.0% 0 9.84s 6,879 241 446
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5 Plus 2026-02-15 3.0 10.0 0.0% 0 103.81s 204 24 17,130
Qwen3.6 Plus Preview 3.0 10.0 0.0% 0 0ms 0 0 0
GLM 5 Turbo 3.0 10.0 0.0% 0 40.17s 186 20 1,319

Quick Compare

Switch Comparison Pair