Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

DeepSeek: DeepSeek V3.2 vs Google: Gemini 3.1 Flash Lite

Last updated at: 2026-05-29

Metric DeepSeek V3.2 DeepSeek V3.2 none Release: 2025-12-01 Gemini 3.1 Flash Lite Gemini 3.1 Flash Lite minimal Release: 2026-05-08
Score 6.2 6.7
Rank #97 #84
Reliability 10.0 10.0
Consistency 8.3 8.8
Tests Correct
Attempt pass rate 48.3% 56.7%
Flaky tests 4 3
Total Runs 60 60
Cost per result 0.222 0.123
Total Cost $0.018 $0.013
Input Price $0.252 / 1M $0.250 / 1M
Output Price $0.378 / 1M $1.500 / 1M
Output Tokens 11,159 2,481
Reasoning Tokens 0 0
Response Time (avg) 14.43s 1.37s
Response Time (max) 115.89s 4.49s
Response Time (total) 288.55s 27.32s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 3.8 8.2 12.5% 1 9.35s 1,073 0
Gemini 3.1 Flash Lite 8.3 10.0 75.0% 0 1.10s 639 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 3.1 5.4 16.7% 1 20.87s 4,522 0
Gemini 3.1 Flash Lite 6.8 10.0 50.0% 0 951ms 660 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 6.5 10.0 0.0% 0 115.89s 2,887 0
Gemini 3.1 Flash Lite 3.0 10.0 0.0% 0 2.53s 357 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 6.3 5.8 66.7% 1 9.42s 1,710 0
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 1.04s 279 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 3.2 6.9 16.7% 1 4.17s 21 0
Gemini 3.1 Flash Lite 2.9 7.2 11.1% 1 1.02s 15 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 9.32s 43 0
Gemini 3.1 Flash Lite 4.0 10.0 0.0% 0 791ms 63 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 1.52s 66 0
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 932ms 72 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 6.91s 298 0
Gemini 3.1 Flash Lite 6.0 4.6 66.7% 2 2.15s 153 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 11.85s 522 0
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 3.51s 234 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 3.0 10.0 0.0% 0 17.23s 17 0
Gemini 3.1 Flash Lite 3.0 10.0 0.0% 0 724ms 9 0

Quick Compare

Switch Comparison Pair