Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

DeepSeek: DeepSeek V3.2 vs xAI: Grok Build 0.1

Last updated at: 2026-05-22

Metric DeepSeek V3.2 DeepSeek V3.2 medium Release: 2025-12-01 Grok Build 0.1 Grok Build 0.1 none Release: 2026-05-21
Score 7.0 6.6
Rank #71 #82
Reliability 9.1 10.0
Consistency 7.6 8.0
Tests Correct
Attempt pass rate 69.2% 60.4%
Flaky tests 6 4
Total Runs 60 57
Cost per result 0.334 7.805
Total Cost $0.037 $0.547
Input Price $0.252 / 1M $1.000 / 1M
Output Price $0.378 / 1M $2.000 / 1M
Output Tokens 7,049 267,275
Reasoning Tokens 68,203 0
Response Time (avg) 53.21s 28.69s
Response Time (max) 189.03s 138.35s
Response Time (total) 1064.26s 459.00s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 9.2 10.0 100.0% 0 24.23s 3,247 6,953
Grok Build 0.1 8.7 7.9 91.7% 1 6.30s 11,162 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 3.9 5.8 33.3% 1 184.97s 640 21,230
Grok Build 0.1 10.0 10.0 100.0% 0 21.41s 16,568 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 93.11s 571 6,296
Grok Build 0.1 0.0 0.0 0.0% 0 0ms 0 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 36.09s 207 7,693
Grok Build 0.1 4.7 1.6 66.7% 1 9.33s 6,359 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 2.9 4.4 22.2% 2 24.27s 21 6,838
Grok Build 0.1 3.6 7.2 22.2% 1 103.71s 179,469 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 3.8 2.5 50.0% 1 58.29s 49 2,189
Grok Build 0.1 4.3 10.0 0.0% 0 12.47s 6,647 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 35.78s 1,397 2,845
Grok Build 0.1 9.8 10.0 100.0% 0 7.36s 8,970 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 6.7 5.0 66.7% 2 36.87s 390 6,281
Grok Build 0.1 6.4 7.7 55.6% 1 9.55s 14,982 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 34.81s 507 859
Grok Build 0.1 0.0 0.0 0.0% 0 0ms 0 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 3.0 10.0 0.0% 0 83.99s 20 7,019
Grok Build 0.1 3.0 10.0 0.0% 0 36.09s 23,118 0

Quick Compare

Switch Comparison Pair