AI BENCHY Compare
Google: Gemini 3.1 Flash Lite Preview vs Qwen: Qwen3.5-Flash
Compare:
Last updated at: 2026-03-05
| Metric | Google: Gemini 3.1 Flash Lite Preview low Release: 2026-03-03 | Qwen: Qwen3.5-Flash medium Release: 2026-02-24 |
|---|---|---|
| Avg Score | 7.6 | 7.0 |
| Tests Correct | ||
| Rank | #12 | #24 |
| Consistency | 10.0 | 7.8 |
| Cost per result | 0.170 | 0.565 |
| Total Cost | $0.019 | $0.057 |
| Attempt pass rate | 73.3% | 82.2% |
| Flaky tests | 0 | 4 |
| common.totalAttempts | 45 (15 x 3) | 45 (15 x 3) |
| Output Tokens | 1,542 | 1,708 |
| Reasoning Tokens | 6,888 | 131,466 |
| Response Time (avg) | 3.49s | 72.86s |
| Response Time (max) | 11.91s | 234.29s |
| Response Time (total) | 52.29s | 1092.84s |
Score vs Total Cost
Response Time (avg)
Avg Score vs Response Time (avg)
Category Breakdown
| Anti-AI Tricks | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 7.0 | 10.0 | 66.7% | 0 | 2.18s | 456 | 1,224 | |
| Qwen: Qwen3.5-Flash | 10.0 | 10.0 | 100.0% | 0 | 71.35s | 363 | 23,645 |
| Combined | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 10.0 | 10.0 | 0.0% | 0 | 11.91s | 225 | 762 | |
| Qwen: Qwen3.5-Flash | 10.0 | 10.0 | 100.0% | 0 | 17.78s | 483 | 8,270 |
| Data parsing and extraction | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 9.9 | 10.0 | 100.0% | 0 | 3.00s | 291 | 696 | |
| Qwen: Qwen3.5-Flash | 5.5 | 5.9 | 83.3% | 1 | 56.99s | 235 | 16,237 |
| Domain specific | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 4.0 | 10.0 | 33.3% | 0 | 2.36s | 18 | 1,212 | |
| Qwen: Qwen3.5-Flash | 4.0 | 7.2 | 44.4% | 1 | 146.50s | 58 | 43,615 |
| Instructions following | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 10.0 | 10.0 | 100.0% | 0 | 1.49s | 72 | 753 | |
| Qwen: Qwen3.5-Flash | 10.0 | 10.0 | 100.0% | 0 | 63.49s | 98 | 14,139 |
| Puzzle Solving | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 10.0 | 10.0 | 100.0% | 0 | 2.76s | 243 | 1,248 | |
| Qwen: Qwen3.5-Flash | 4.0 | 4.4 | 77.8% | 2 | 56.74s | 162 | 24,276 |
| Tool Calling | Score | Consistency | Attempt pass rate | Flaky tests | Tests Correct | Response Time (avg) | Output Tokens | Reasoning Tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 10.0 | 10.0 | 100.0% | 0 | 9.54s | 237 | 993 | |
| Qwen: Qwen3.5-Flash | 10.0 | 10.0 | 100.0% | 0 | 10.33s | 309 | 1,284 |
Quick Compare
Switch Comparison Pair
DeepSeek V3.2mediumvsGemini 3.1 Flash Lite PreviewlowGemini 3 Flash PreviewnonevsQwen3.5-FlashmediumClaude Sonnet 4.6mediumvsGemini 3.1 Flash Lite PreviewlowClaude Sonnet 4.6nonevsQwen3.5-FlashmediumGemini 3.1 Flash Lite PreviewlowvsStep 3.5 FlashmediumFree AvailableGemini 3.1 Flash Lite PreviewlowvsMiMo-V2-FlashmediumGemini 3.1 Flash Lite PreviewlowvsGLM 5mediumGemini 3.1 Flash Lite PreviewlowvsGPT-5.3 ChatnoneGemini 3.1 Flash Lite PreviewlowvsGPT-5.2 ChatnoneGemini 3.1 Flash Lite PreviewnonevsQwen3.5-FlashmediumGPT-5.3 ChatnonevsQwen3.5-FlashmediumGemini 3.1 Flash Lite PreviewlowvsGPT-5.4medium