Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Anthropic: Claude Opus 4.7 vs DeepSeek: DeepSeek V3.2

Last updated at: 2026-04-16

Metric Claude Opus 4.7 Claude Opus 4.7 none Release: 2026-04-16 DeepSeek V3.2 DeepSeek V3.2 medium Release: 2025-12-01
Score 9.2 8.0
Rank #4 #27
Consistency 10.0 8.2
Tests Correct
Attempt pass rate 88.9% 79.6%
Flaky tests 0 4
Total Runs 54 54
Cost per result 3.155 0.240
Total Cost $0.505 $0.029
Input Price $5.000 / 1M $0.260 / 1M
Output Price $25.000 / 1M $0.380 / 1M
Output Tokens 6,326 10,620
Reasoning Tokens 0 48,511
Response Time (avg) 3.13s 46.41s
Response Time (max) 18.27s 180.92s
Response Time (total) 56.33s 835.33s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 8.3 10.0 75.0% 0 2.12s 522 0
DeepSeek V3.2 8.4 9.9 75.0% 0 30.72s 3,773 7,523
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 2.84s 494 0
DeepSeek V3.2 4.7 1.6 66.7% 1 180.92s 626 6,792
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 9.5 10.0 100.0% 0 18.27s 3,504 0
DeepSeek V3.2 10.0 10.0 100.0% 0 93.11s 571 6,296
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 2.15s 324 0
DeepSeek V3.2 10.0 10.0 100.0% 0 36.09s 207 7,693
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 7.7 10.0 66.7% 0 1.19s 78 0
DeepSeek V3.2 5.3 7.2 44.4% 1 39.32s 3,081 7,856
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 3.47s 257 0
DeepSeek V3.2 5.4 2.5 66.7% 1 31.30s 68 2,366
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 1.46s 114 0
DeepSeek V3.2 10.0 10.0 100.0% 0 35.78s 1,397 2,845
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 2.58s 661 0
DeepSeek V3.2 8.2 7.2 88.9% 1 36.87s 390 6,281
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Claude Opus 4.7 10.0 10.0 100.0% 0 4.74s 372 0
DeepSeek V3.2 10.0 10.0 100.0% 0 34.81s 507 859

Quick Compare

Switch Comparison Pair