#162 Qwen3.5-9B
medium- Cost
- $0.001
- Time
- 35.9s
- Tokens
- 3,030 tok
Summary
Qwen3.5-9B scores 4.2 on AI BENCHY and ranks #162. It has 6.7 reliability, a 27.0% pass rate, $0.036 total cost, and 82.24s average response time.
What makes Qwen3.5-9B unique: Its total benchmark cost is unusually low for its score range. It uses unusually many reasoning tokens, which can help explain its slower or more expensive runs.
4.2
Consistency
8.0
6.7
Total Output Tokens
238,561
Total Input Tokens
17,070
Input Price
$0.100 / 1M
Output Price
$0.150 / 1M
Flaky tests
5
Flaky tests had mixed outcomes across runs (at least one pass and one fail).
Generation showcase
Prompt: Create a detailed SVG illustration of a hamster playing table tennis.
Run history
| Tested on | Score | Reliability | Tests Correct | Total Cost | Compare |
|---|---|---|---|---|---|
| 2026-06-04 13:41 New test added | 4.2 | 5.6 | $0.035 ↓ | Current run | |
| 2026-05-22 00:18 Suite changed | 4.2 | 1.7 | $0.035 | Compare | |
| 2026-05-08 14:44 Suite changed | 4.3 | 3.3 | $0.035 | Compare | |
| 2026-05-08 14:44 Suite changed | 4.3 | 3.3 | $0.035 | Compare | |
| 2026-04-20 17:48 First recorded run | 4.4 | N/A | $0.030 | Compare |
This run used a different benchmark suite. Keep suite changes in mind when reading historical movement.
Run comparison
| Run | Score | Consistency | Reliability | Tests Correct | Flaky tests | Total Output Tokens | Total Input Tokens | Total Cost | Response Time (avg) |
|---|---|---|---|---|---|---|---|---|---|
| 2026-06-04 13:41 · Current run | 4.2 | 8.0 | 6.7 | 3/21 | 5 | 238,561 | 17,070 | $0.036 | 82.24s |
| 2026-05-22 00:18 · Suite changed | 4.2 | 7.0 | 1.7 | 3/20 | 7 | 229,656 | 0 | $0.035 | 80.10s |
| Difference | 0.0 | +1.0 | +5.0 | 0 | -2 | +8905 | +17070 | +$0.002 | +2142ms |
These two runs used different benchmark suites, so the deltas reflect both model changes and suite changes.
Price History
Historical pricing data for this model from OpenRouter.
| Date | Input Price | Output Price |
|---|---|---|
| 2026-06-04 15:40 | $0.040 / 1M | $0.150 / 1M |
| 2026-06-10 13:42 | $0.100 / 1M | $0.150 / 1M |
Choose the first model, then click a second model to open a side-by-side page.
| Category | Score | Consistency | Tests Correct |
|---|---|---|---|
| Anti-AI Tricks | 5.1 | 5.8 | |
| Coding | 2.9 | 10.0 | |
| Combined | 3.0 | 10.0 | |
| Data parsing and extraction | 3.6 | 5.6 | |
| Domain specific | 3.6 | 7.2 | |
| General Intelligence | 2.8 | 1.6 | |
| Instructions following | 6.5 | 10.0 | |
| Puzzle Solving | 3.0 | 10.0 | |
| Tool Calling | 10.0 | 10.0 | |
| Trivia | 3.0 | 10.0 |