Navigate
AI BENCHY
Compare Charts Methodology
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Google: Gemini 3.1 Flash Lite Preview vs Qwen: Qwen3.5-Flash

Compare:

Last updated at: 2026-03-06

Metric Google: Gemini 3.1 Flash Lite Preview high Release: 2026-03-03 Qwen: Qwen3.5-Flash medium Release: 2026-02-24
Rank #8 #24
Avg Score 8.2 6.9
Consistency 9.6 7.5
Cost per result 19.243 0.720
Total Cost $2.310 $0.072
Tests Correct
Attempt pass rate 77.1% 81.3%
Flaky tests 1 5
Total Runs 48 (16 x 3) 48 (16 x 3)
Output Tokens 1,283 1,807
Reasoning Tokens 1,533,310 169,952
Response Time (avg) 68.83s 70.81s
Response Time (max) 280.52s 234.29s
Response Time (total) 1101.32s 1132.90s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Avg Score vs Response Time (avg)

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Google: Gemini 3.1 Flash Lite Preview 10.0 10.0 100.0% 0 43.87s 144 193,077
Qwen: Qwen3.5-Flash 10.0 10.0 100.0% 0 71.35s 363 23,645
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Google: Gemini 3.1 Flash Lite Preview 10.0 10.0 100.0% 0 280.52s 335 380,440
Qwen: Qwen3.5-Flash 10.0 10.0 100.0% 0 17.78s 483 8,270
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Google: Gemini 3.1 Flash Lite Preview 9.9 10.0 100.0% 0 7.16s 279 6,186
Qwen: Qwen3.5-Flash 5.5 5.9 83.3% 1 56.99s 235 16,237
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Google: Gemini 3.1 Flash Lite Preview 4.0 10.0 33.3% 0 127.58s 18 566,202
Qwen: Qwen3.5-Flash 4.0 7.2 44.4% 1 146.50s 58 43,615
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Google: Gemini 3.1 Flash Lite Preview 10.0 10.0 100.0% 0 5.25s 117 3,915
Qwen: Qwen3.5-Flash 5.0 3.1 66.7% 1 40.05s 99 38,486
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Google: Gemini 3.1 Flash Lite Preview 9.0 6.9 66.7% 1 70.07s 69 190,053
Qwen: Qwen3.5-Flash 10.0 10.0 100.0% 0 63.49s 98 14,139
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Google: Gemini 3.1 Flash Lite Preview 7.0 10.0 66.7% 0 46.33s 87 190,953
Qwen: Qwen3.5-Flash 4.0 4.4 77.8% 2 56.74s 162 24,276
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Google: Gemini 3.1 Flash Lite Preview 10.0 10.0 100.0% 0 7.73s 234 2,484
Qwen: Qwen3.5-Flash 10.0 10.0 100.0% 0 10.33s 309 1,284

Quick Compare

Switch Comparison Pair