Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Anthropic: Claude Opus 4.7 vs DeepSeek: DeepSeek V4 Flash

Last updated at: 2026-05-22

Metric Claude Opus 4.7 Claude Opus 4.7 medium Release: 2026-04-16 DeepSeek V4 Flash DeepSeek V4 Flash high Release: 2026-04-24 Free Available
Score 8.9 7.4
Rank #7 #55
Reliability 10.0 10.0
Consistency 10.0 8.0
Tests Correct
Attempt pass rate 85.0% 71.7%
Flaky tests 0 5
Total Runs 60 60
Cost per result 3.674 0.339
Total Cost $0.625 $0.038
Input Price $5.000 / 1M $0.112 / 1M
Output Price $25.000 / 1M $0.224 / 1M
Output Tokens 10,468 10,299
Reasoning Tokens 2,198 116,570
Response Time (avg) 4.50s 46.28s
Response Time (max) 23.18s 218.13s
Response Time (total) 85.46s 925.55s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 8.3 10.0 75.0% 0 1.85s 348 0
DeepSeek V4 Flash 8.3 10.0 75.0% 0 28.51s 140 7,770
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 14.79s 6,210 1,114
DeepSeek V4 Flash 6.8 10.0 50.0% 0 58.13s 387 27,101
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 21.45s 2,369 1,084
DeepSeek V4 Flash 10.0 10.0 100.0% 0 76.57s 465 7,347
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 2.37s 324 0
DeepSeek V4 Flash 10.0 10.0 100.0% 0 28.03s 201 1,179
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 7.7 10.0 66.7% 0 1.17s 51 0
DeepSeek V4 Flash 4.1 4.4 44.5% 2 100.31s 27 59,249
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 2.87s 256 0
DeepSeek V4 Flash 6.1 3.1 66.7% 1 25.15s 79 632
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 1.57s 114 0
DeepSeek V4 Flash 10.0 10.0 100.0% 0 15.36s 63 1,622
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 2.51s 399 0
DeepSeek V4 Flash 6.4 4.4 77.8% 2 25.53s 193 2,597
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 4.17s 373 0
DeepSeek V4 Flash 10.0 10.0 100.0% 0 74.73s 228 542
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 3.0 10.0 0.0% 0 2.25s 24 0
DeepSeek V4 Flash 3.0 10.0 0.0% 0 54.46s 8,516 8,531

Quick Compare

Switch Comparison Pair