Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

DeepSeek: DeepSeek V4 Pro vs OpenAI: GPT-5.4

Summary

DeepSeek V4 Pro vs GPT-5.4 benchmark comparison: GPT-5.4 leads on average score with 8.5 vs 7.6. DeepSeek V4 Pro has the lower benchmark cost at $0.157 vs $1.210. GPT-5.4 is faster at 22.35s vs 77.20s, with pass rates of 66.7% vs 76.2%.

Recommended model: DeepSeek V4 Pro - It offers the best overall trade-off: a competitive score (7.6), lower cost than GPT-5.4, and balanced response time.

Last updated at: 2026-06-17

Metric DeepSeek V4 Pro DeepSeek V4 Pro high Release: 2026-04-24 GPT-5.4 GPT-5.4 medium Release: 2026-03-05
Score 7.6 8.5
Rank #41 #17
Reliability 9.3 10.0
Consistency 7.0 8.6
Tests Correct
Attempt pass rate 66.7% 76.2%
Flaky tests 8 4
Total Runs 63 63
Cost per result 1.742 8.640
Total Cost $0.157 $1.210
Input Price $0.435 / 1M $2.500 / 1M
Output Price $0.870 / 1M $15.000 / 1M
Total Input Tokens 38,726 34,108
Output Tokens 6,334 2,242
Reasoning Tokens 159,151 72,707
Response Time (avg) 77.20s 22.35s
Response Time (max) 416.76s 100.41s
Response Time (total) 1621.17s 469.29s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#41 DeepSeek V4 Pro

high
Cost
$0.023
Time
257.6s
Tokens
14,870 tok

#17 GPT-5.4

medium
Cost
$0.214
Time
199.6s
Tokens
14,349 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 5.7 5.9 58.3% 2 25.70s 536 149 3,214
GPT-5.4 8.3 10.0 75.0% 0 4.11s 606 240 1,511
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 6.1 4.6 66.7% 2 243.00s 5,090 383 84,580
GPT-5.4 8.8 7.8 88.9% 1 44.36s 7,305 433 24,216
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 10.0 10.0 100.0% 0 38.17s 14,060 454 5,836
GPT-5.4 10.0 10.0 100.0% 0 20.57s 11,019 301 3,543
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 10.0 10.0 100.0% 0 25.03s 7,690 274 2,166
GPT-5.4 10.0 10.0 100.0% 0 5.32s 7,140 234 804
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 3.6 7.2 22.2% 1 151.46s 569 4,404 50,391
GPT-5.4 5.3 7.2 44.4% 1 74.27s 619 61 34,748
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 10.0 10.0 100.0% 0 8.83s 471 115 1,013
GPT-5.4 4.7 3.1 33.3% 1 4.92s 477 145 321
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 7.8 6.6 83.3% 1 8.73s 627 66 2,726
GPT-5.4 10.0 10.0 100.0% 0 3.11s 660 93 897
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 6.9 4.9 77.8% 2 56.85s 591 178 2,563
GPT-5.4 8.2 7.2 88.9% 1 9.14s 642 441 3,815
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 9.8 10.0 100.0% 0 15.92s 8,909 295 701
GPT-5.4 10.0 10.0 100.0% 0 13.28s 5,445 264 1,031
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 3.0 10.0 0.0% 0 34.01s 183 16 5,961
GPT-5.4 3.0 10.0 0.0% 0 13.95s 195 30 1,821

Quick Compare

Switch Comparison Pair