Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Anthropic: Claude Opus 4.7 vs DeepSeek: DeepSeek V4 Flash

Summary

Claude Opus 4.7 vs DeepSeek V4 Flash benchmark comparison: Claude Opus 4.7 leads on average score with 8.7 vs 8.3. DeepSeek V4 Flash has the lower benchmark cost at $0.027 vs $0.679. Claude Opus 4.7 is faster at 4.73s vs 45.85s, with pass rates of 82.5% vs 74.6%.

Recommended model: DeepSeek V4 Flash - Its score stays close to the best score here (8.3 vs 8.7), while costing about 26.0x less than Claude Opus 4.7.

Last updated at: 2026-06-17

Metric Claude Opus 4.7 Claude Opus 4.7 medium Release: 2026-04-16 DeepSeek V4 Flash DeepSeek V4 Flash high Release: 2026-04-24
Score 8.7 8.3
Rank #13 #23
Reliability 10.0 10.0
Consistency 9.6 8.5
Tests Correct
Attempt pass rate 82.5% 74.6%
Flaky tests 1 4
Total Runs 63 63
Cost per result 3.991 0.299
Total Cost $0.679 $0.027
Input Price $5.000 / 1M $0.090 / 1M
Output Price $25.000 / 1M $0.180 / 1M
Total Input Tokens 65,406 39,745
Output Tokens 11,858 10,310
Reasoning Tokens 2,198 123,501
Response Time (avg) 4.73s 45.85s
Response Time (max) 23.18s 218.13s
Response Time (total) 94.51s 962.79s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#13 Claude Opus 4.7

medium
Cost
$0.059
Time
26.8s
Tokens
2,475 tok

#23 DeepSeek V4 Flash

high
Cost
$0.003
Time
93.1s
Tokens
7,926 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.7 8.3 10.0 75.0% 0 1.85s 894 348 0
DeepSeek V4 Flash 8.3 10.0 75.0% 0 28.51s 540 140 7,770
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.7 7.6 7.2 77.8% 1 12.96s 10,635 7,629 1,114
DeepSeek V4 Flash 7.8 10.0 66.7% 0 50.60s 7,279 395 34,862
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 21.45s 24,501 2,369 1,084
DeepSeek V4 Flash 10.0 10.0 100.0% 0 76.57s 14,016 465 7,347
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 2.37s 10,533 324 0
DeepSeek V4 Flash 10.0 10.0 100.0% 0 28.03s 7,290 201 1,179
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.7 7.7 10.0 66.7% 0 1.17s 630 51 0
DeepSeek V4 Flash 4.1 4.4 44.5% 2 100.31s 666 27 59,249
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 2.87s 723 256 0
DeepSeek V4 Flash 6.1 3.1 66.7% 1 25.15s 471 79 632
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 1.57s 939 114 0
DeepSeek V4 Flash 10.0 10.0 100.0% 0 15.36s 627 63 1,622
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 2.43s 939 370 0
DeepSeek V4 Flash 8.2 7.2 88.9% 1 26.11s 594 196 1,767
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 4.17s 15,339 373 0
DeepSeek V4 Flash 10.0 10.0 100.0% 0 74.73s 8,079 228 542
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
Claude Opus 4.7 3.0 10.0 0.0% 0 2.25s 273 24 0
DeepSeek V4 Flash 3.0 10.0 0.0% 0 54.46s 183 8,516 8,531

Quick Compare

Switch Comparison Pair