Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

DeepSeek: DeepSeek V4 Flash vs xAI: Grok Build 0.1

Last updated at: 2026-05-21

Metric DeepSeek V4 Flash DeepSeek V4 Flash high Release: 2026-04-24 Free Available Grok Build 0.1 Grok Build 0.1 medium Release: 2026-05-21
Score 7.6 7.8
Rank #54 #41
Reliability 10.0 10.0
Consistency 7.9 8.9
Tests Correct
Attempt pass rate 75.4% 71.9%
Flaky tests 5 3
Total Runs 57 57
Cost per result 0.299 4.064
Total Cost $0.033 $0.488
Input Price $0.112 / 1M $1.000 / 1M
Output Price $0.224 / 1M $2.000 / 1M
Output Tokens 10,281 1,947
Reasoning Tokens 98,830 223,372
Response Time (avg) 45.88s 22.28s
Response Time (max) 218.13s 88.28s
Response Time (total) 871.76s 423.30s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Flash 8.3 10.0 75.0% 0 28.51s 140 7,770
Grok Build 0.1 10.0 10.0 100.0% 0 5.46s 195 9,825
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Flash 10.0 10.0 100.0% 0 62.48s 369 9,361
Grok Build 0.1 7.3 3.7 66.7% 1 30.98s 354 17,734
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Flash 10.0 10.0 100.0% 0 76.57s 465 7,347
Grok Build 0.1 10.0 10.0 100.0% 0 30.81s 231 18,779
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Flash 10.0 10.0 100.0% 0 28.03s 201 1,179
Grok Build 0.1 10.0 10.0 100.0% 0 7.76s 180 10,343
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Flash 4.1 4.4 44.5% 2 100.31s 27 59,249
Grok Build 0.1 5.3 10.0 33.3% 0 77.75s 501 111,807
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Flash 6.1 3.1 66.7% 1 25.15s 79 632
Grok Build 0.1 3.8 2.5 33.3% 1 10.14s 78 5,386
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Flash 10.0 10.0 100.0% 0 15.36s 63 1,622
Grok Build 0.1 9.8 10.0 100.0% 0 9.62s 57 12,436
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Flash 6.4 4.4 77.8% 2 25.53s 193 2,597
Grok Build 0.1 6.2 7.5 55.6% 1 8.67s 161 15,476
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Flash 10.0 10.0 100.0% 0 74.73s 228 542
Grok Build 0.1 10.0 10.0 100.0% 0 9.40s 180 5,319
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V4 Flash 3.0 10.0 0.0% 0 54.46s 8,516 8,531
Grok Build 0.1 3.0 10.0 0.0% 0 26.07s 10 16,267

Quick Compare

Switch Comparison Pair