Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Qwen: Qwen3.5-27B vs StepFun: Step 3.7 Flash

Last updated at: 2026-06-02

Metric Qwen3.5-27B Qwen3.5-27B medium Release: 2026-02-24 Step 3.7 Flash Step 3.7 Flash low Release: 2026-05-29
Score 7.9 7.4
Rank #27 #59
Reliability 10.0 10.0
Consistency 8.9 8.7
Tests Correct
Attempt pass rate 73.3% 68.3%
Flaky tests 3 3
Total Runs 60 60
Cost per result 4.532 2.796
Total Cost $0.488 $0.336
Input Price $0.195 / 1M $0.200 / 1M
Output Price $1.560 / 1M $1.150 / 1M
Total Input Tokens 39,329 37,458
Output Tokens 2,569 285,209
Reasoning Tokens 304,894 0
Response Time (avg) 60.09s 16.06s
Response Time (max) 177.36s 124.75s
Response Time (total) 1201.89s 321.11s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-27B 8.7 7.9 91.7% 1 19.75s 672 569 31,505
Step 3.7 Flash 8.7 7.9 91.7% 1 4.02s 756 10,896 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-27B 7.0 9.8 50.0% 0 123.86s 5,060 416 64,993
Step 3.7 Flash 10.0 10.0 100.0% 0 9.43s 4,794 14,569 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-27B 10.0 10.0 100.0% 0 163.96s 14,946 483 9,991
Step 3.7 Flash 10.0 10.0 100.0% 0 7.98s 13,683 6,426 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-27B 10.0 10.0 100.0% 0 30.26s 7,782 270 16,150
Step 3.7 Flash 7.3 5.8 83.3% 1 2.29s 7,398 2,667 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-27B 5.3 10.0 33.3% 0 79.53s 553 43 52,368
Step 3.7 Flash 5.3 7.2 44.4% 1 43.31s 828 104,487 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-27B 6.1 3.1 66.7% 1 101.41s 524 70 23,147
Step 3.7 Flash 3.4 9.3 0.0% 0 7.00s 525 4,604 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-27B 10.0 10.0 100.0% 0 19.66s 699 97 11,638
Step 3.7 Flash 9.8 10.0 100.0% 0 1.58s 735 1,857 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-27B 8.2 7.7 77.8% 1 59.60s 696 242 70,096
Step 3.7 Flash 5.5 9.9 33.3% 0 1.84s 756 3,564 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-27B 10.0 10.0 100.0% 0 7.45s 8,193 348 1,323
Step 3.7 Flash 10.0 10.0 100.0% 0 3.25s 7,746 1,360 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Qwen3.5-27B 3.0 10.0 0.0% 0 85.11s 204 31 23,683
Step 3.7 Flash 3.0 10.0 0.0% 0 124.75s 237 134,779 0

Quick Compare

Switch Comparison Pair