Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Qwen: Qwen3.5-9B vs xAI: Grok 4.1 Fast

Last updated at: 2026-03-12

Metric Qwen3.5-9B Qwen3.5-9B medium Release: 2026-03-02 Grok 4.1 Fast Grok 4.1 Fast none Release: 2025-11-19
Rank #66 #63
Avg Score 2.6 2.9
Consistency 7.4 8.9
Cost per result 0.779 0.247
Total Cost $0.024 $0.008
Tests Correct
Attempt pass rate 35.4% 25.0%
Flaky tests 5 2
Total Runs 48 48
Output Tokens 17,930 1,148
Reasoning Tokens 139,706 0
Response Time (avg) 71.44s 1.90s
Response Time (max) 226.38s 5.51s
Response Time (total) 928.77s 17.14s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Avg Score vs Response Time (avg)

Total Output Tokens

Avg Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Qwen3.5-9B 4.0 7.2 55.6% 1 31.54s 2,410 10,913
Grok 4.1 Fast 1.3 10.0 0.0% 0 1.73s 229 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Qwen3.5-9B 10.0 10.0 0.0% 0 0ms 0 0
Grok 4.1 Fast 10.0 10.0 0.0% 0 3.33s 105 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Qwen3.5-9B 5.0 5.6 33.3% 1 87.31s 1,383 32,113
Grok 4.1 Fast 9.9 10.0 100.0% 0 943ms 180 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Qwen3.5-9B 10.0 7.2 22.2% 1 137.75s 11,549 48,475
Grok 4.1 Fast 4.0 7.2 55.6% 1 1.06s 15 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Qwen3.5-9B 10.0 1.6 33.3% 1 226.38s 0 30,695
Grok 4.1 Fast 3.0 9.9 0.0% 0 1.08s 112 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Qwen3.5-9B 5.5 5.8 66.7% 1 17.15s 599 4,517
Grok 4.1 Fast 10.0 10.0 0.0% 0 923ms 56 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Qwen3.5-9B 10.0 10.0 0.0% 0 33.38s 1,545 11,844
Grok 4.1 Fast 1.3 10.0 0.0% 0 1.28s 243 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Qwen3.5-9B 10.0 10.0 100.0% 0 4.31s 444 1,149
Grok 4.1 Fast 10.0 1.6 33.3% 1 5.51s 208 0

Quick Compare

Switch Comparison Pair