AI BENCHY
Advertise here
#37

Qwen3.6 27B

Qwen Release: 2026-04-20 Tested on: 2026-04-27 21:31 qwen/qwen3.6-27b::medium
(medium) (none)

Summary

Qwen3.6 27B scores 7.9 on AI BENCHY and ranks #37. It has 10.0 reliability, a 77.8% pass rate, $0.043 total cost, and 25.56s average response time.

Score

7.9

Consistency

8.5

Total Output Tokens

21,553

Total Input Tokens

0

Input Price

$0.500 / 1M

Output Price

$2.000 / 1M

Tests Correct

Wrong Tests: 2

Attempt pass rate: 77.8%

Flaky tests

1

Flaky tests had mixed outcomes across runs (at least one pass and one fail).

Response Time (avg)

25.56s

Response Time (max): 47.48s

Response Time (total): 153.33s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#37 Qwen3.6 27B

medium
Cost
$0.009
Time
39.6s
Tokens
3,090 tok

Run history

Tested on Score Reliability Tests Correct Total Cost Compare
2026-06-04 13:21 New test added 6.8 10.0 $0.444 Compare
2026-05-21 23:59 Suite changed 6.6 9.9 $0.272 Compare
2026-04-27 21:48 New test added 7.0 10.0 $0.209 Compare
2026-04-27 21:31 First recorded run 7.9 10.0 $0.043 Current run

Run comparison

RunScoreConsistencyReliabilityTests CorrectFlaky testsTotal Output TokensTotal Input TokensTotal CostResponse Time (avg)
2026-04-27 21:31 · First recorded run7.98.510.04/6121,5530$0.04325.56s
2026-05-21 23:59 · Suite changed6.68.19.99/205118,7040$0.27257.65s
Difference+1.3+0.4+0.1-5-4-971510-$0.229-32096ms

These two runs used different benchmark suites, so the deltas reflect both model changes and suite changes.

Charts

Choose the first model, then click a second model to open a side-by-side page.

Total Output Tokens

Score vs Total Output Tokens

Quick Compare

Category Breakdown

Category Score Consistency Tests Correct
Anti-AI Tricks 10.0 10.0
Data parsing and extraction 4.3 1.2
Domain specific 3.0 10.0
Instructions following 10.0 10.0
Tool Calling 10.0 10.0

Compared models