Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

DeepSeek: DeepSeek V4 Pro vs Google: Gemini 3.1 Flash Lite

Last updated at: 2026-05-22

Metric DeepSeek V4 Pro DeepSeek V4 Pro none Release: 2026-04-24 Gemini 3.1 Flash Lite Gemini 3.1 Flash Lite none Release: 2026-05-08
Score 6.0 6.6
Rank #95 #85
Reliability 8.1 10.0
Consistency 8.9 8.5
Tests Correct
Attempt pass rate 48.3% 55.0%
Flaky tests 3 4
Total Runs 60 60
Cost per result 0.564 0.135
Total Cost $0.046 $0.013
Input Price $0.435 / 1M $0.250 / 1M
Output Price $0.870 / 1M $1.500 / 1M
Output Tokens 5,347 2,478
Reasoning Tokens 0 0
Response Time (avg) 13.48s 1.09s
Response Time (max) 58.65s 2.97s
Response Time (total) 269.56s 21.79s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Pro 3.5 8.0 16.7% 1 14.02s 704 0
Gemini 3.1 Flash Lite 7.5 8.4 66.7% 1 1.07s 639 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Pro 5.4 6.8 33.3% 1 8.27s 527 0
Gemini 3.1 Flash Lite 6.8 10.0 50.0% 0 1.13s 660 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Pro 9.5 10.0 100.0% 0 25.49s 1,911 0
Gemini 3.1 Flash Lite 3.0 10.0 0.0% 0 2.73s 357 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Pro 8.8 10.0 100.0% 0 30.54s 170 0
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 843ms 279 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Pro 5.3 10.0 33.3% 0 3.17s 18 0
Gemini 3.1 Flash Lite 2.9 7.2 11.1% 1 762ms 15 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Pro 4.3 9.9 0.0% 0 3.75s 132 0
Gemini 3.1 Flash Lite 4.0 10.0 0.0% 0 992ms 63 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Pro 6.3 10.0 50.0% 0 8.23s 64 0
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 859ms 72 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Pro 7.6 7.2 77.8% 1 19.72s 175 0
Gemini 3.1 Flash Lite 6.3 4.8 66.7% 2 720ms 150 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Pro 10.0 10.0 100.0% 0 5.92s 219 0
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 2.97s 234 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Pro 3.0 10.0 0.0% 0 15.59s 1,427 0
Gemini 3.1 Flash Lite 3.0 10.0 0.0% 0 733ms 9 0

Quick Compare

Switch Comparison Pair