Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

DeepSeek: DeepSeek V3.2 vs Inception: Mercury 2

Last updated at: 2026-06-03

Metric DeepSeek V3.2 DeepSeek V3.2 none Release: 2025-12-01 Mercury 2 Mercury 2 none Release: 2026-02-24
Score 5.4 4.6
Rank #130 #153
Reliability 10.0 10.0
Consistency 7.5 9.1
Tests Correct
Attempt pass rate 41.7% 25.0%
Flaky tests 6 2
Total Runs 60 60
Cost per result 0.296 0.216
Total Cost $0.017 $0.009
Input Price $0.229 / 1M $0.250 / 1M
Output Price $0.344 / 1M $0.750 / 1M
Total Input Tokens 53,408 25,515
Output Tokens 11,159 3,001
Reasoning Tokens 0 0
Response Time (avg) 14.43s 614ms
Response Time (max) 115.89s 1.27s
Response Time (total) 288.55s 12.28s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V3.2 3.2 8.0 8.3% 1 9.35s 494 1,073 0
Mercury 2 3.0 10.0 0.0% 0 483ms 631 286 0
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V3.2 3.1 5.4 16.7% 1 20.87s 4,690 4,522 0
Mercury 2 3.5 9.4 0.0% 0 831ms 4,631 1,650 0
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V3.2 6.5 10.0 0.0% 0 115.89s 29,843 2,887 0
Mercury 2 3.0 10.0 0.0% 0 606ms 4,821 131 0
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V3.2 6.3 5.8 66.7% 1 9.42s 7,890 1,710 0
Mercury 2 7.3 5.9 83.3% 1 667ms 6,362 180 0
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V3.2 2.9 7.2 11.1% 1 4.17s 624 21 0
Mercury 2 5.3 7.2 44.4% 1 534ms 784 46 0
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V3.2 4.7 1.6 66.7% 1 9.32s 314 43 0
Mercury 2 4.8 10.0 0.0% 0 628ms 495 159 0
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 1.52s 627 66 0
Mercury 2 6.5 10.0 50.0% 0 551ms 691 82 0
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V3.2 7.6 7.2 77.8% 1 6.91s 424 298 0
Mercury 2 3.1 10.0 0.0% 0 535ms 694 251 0
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 11.85s 8,319 522 0
Mercury 2 10.0 10.0 100.0% 0 1.27s 6,193 197 0
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V3.2 3.0 10.0 0.0% 0 17.23s 183 17 0
Mercury 2 3.0 10.0 0.0% 0 548ms 213 19 0

Quick Compare

Switch Comparison Pair