Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Owl Alpha vs Qwen: Qwen3.5 Plus 2026-04-20

Last updated at: 2026-05-22

Metric Owl Alpha Owl Alpha none Release: 2026-04-30 Qwen3.5 Plus 2026-04-20 Qwen3.5 Plus 2026-04-20 none Release: 2026-04-20
Score 5.7 5.8
Rank #106 #101
Reliability 10.0 9.9
Consistency 9.2 8.5
Tests Correct
Attempt pass rate 41.7% 43.3%
Flaky tests 2 4
Total Runs 60 60
Cost per result 0.000 0.583
Total Cost $0.000 $0.041
Input Price $0.000 / 1M $0.300 / 1M
Output Price $0.000 / 1M $1.800 / 1M
Output Tokens 4,864 11,174
Reasoning Tokens 0 0
Response Time (avg) 8.84s 4.58s
Response Time (max) 47.10s 33.34s
Response Time (total) 176.83s 91.55s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Owl Alpha 3.4 7.9 16.7% 1 2.78s 57 0
Qwen3.5 Plus 2026-04-20 4.8 10.0 25.0% 0 1.88s 557 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Owl Alpha 7.0 9.9 50.0% 0 39.68s 3,629 0
Qwen3.5 Plus 2026-04-20 4.4 6.7 16.7% 1 2.08s 474 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Owl Alpha 3.0 10.0 0.0% 0 21.74s 315 0
Qwen3.5 Plus 2026-04-20 2.8 1.6 33.3% 1 13.32s 2,275 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Owl Alpha 10.0 10.0 100.0% 0 3.60s 246 0
Qwen3.5 Plus 2026-04-20 10.0 10.0 100.0% 0 2.82s 243 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Owl Alpha 5.3 10.0 33.3% 0 3.00s 27 0
Qwen3.5 Plus 2026-04-20 5.3 10.0 33.3% 0 4.43s 18 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Owl Alpha 4.3 10.0 0.0% 0 4.61s 80 0
Qwen3.5 Plus 2026-04-20 4.8 10.0 0.0% 0 1.41s 119 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Owl Alpha 6.4 10.0 50.0% 0 2.63s 63 0
Qwen3.5 Plus 2026-04-20 6.2 5.8 66.7% 1 1.17s 68 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Owl Alpha 5.9 7.2 55.6% 1 4.43s 202 0
Qwen3.5 Plus 2026-04-20 6.7 7.9 55.6% 1 2.03s 618 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Owl Alpha 10.0 10.0 100.0% 0 22.78s 231 0
Qwen3.5 Plus 2026-04-20 10.0 10.0 100.0% 0 4.42s 297 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Owl Alpha 3.0 10.0 0.0% 0 2.50s 14 0
Qwen3.5 Plus 2026-04-20 3.0 10.0 0.0% 0 33.34s 6,505 0

Quick Compare

Switch Comparison Pair