AI BENCHY
Your ad here

#37

Qwen3.6 27B

Qwen Release: 2026-04-20 Tested on: 2026-04-27 21:31 qwen/qwen3.6-27b::medium
(medium) (none)

Score

7.9

Consistency

8.5

Total Cost

$0.043

Total Output Tokens

21,553

Input Price

$0.500 / 1M

Output Price

$2.000 / 1M

Tests Correct

Wrong Tests: 2

Attempt pass rate: 77.8%

Flaky tests

1

Flaky tests had mixed outcomes across runs (at least one pass and one fail).

Response Time (avg)

25.56s

Response Time (max): 47.48s

Response Time (total): 153.33s

Run history

Tested on Score Reliability Tests Correct Total Cost Compare
2026-04-27 21:48 New test added 7.0 10.0 $0.209 Compare
2026-04-27 21:31 First recorded run 7.9 10.0 $0.043 Current run

Run comparison

RunScoreConsistencyReliabilityTests CorrectFlaky testsTotal Output TokensTotal CostResponse Time (avg)
2026-04-27 21:31 · First recorded run7.98.510.04/6121,553$0.04325.56s
2026-04-27 21:48 · New test added7.07.910.09/18599,362$0.20950.53s
Difference+0.9+0.60.0-5-4-77809-$0.166-24972ms

These two runs used different benchmark suites, so the deltas reflect both model changes and suite changes.

Charts

Choose the first model, then click a second model to open a side-by-side page.

Total Output Tokens

Score vs Total Output Tokens

Quick Compare

Category Breakdown

Category Score Consistency Tests Correct
Anti-AI Tricks 10.0 10.0
Data parsing and extraction 4.3 1.2
Domain specific 3.0 10.0
Instructions following 10.0 10.0
Tool Calling 10.0 10.0

Compared models