DeepSeek: DeepSeek V3.2 vs Inception: Mercury 2

The average score is effectively tied at 7.0 vs 7.0. DeepSeek V3.2 (medium) has the lower benchmark cost at $0.078 vs $0.093. Mercury 2 (medium) is faster at 2.72s vs 68.62s, with pass rates of 65.2% vs 51.5%.

Recommended modelMercury 2 (medium)It has the best score here (7.0), while responding about 25.2x faster than DeepSeek V3.2 (medium).

Last updated at: 2026-07-18

Metric	DeepSeek V3.2 DeepSeek V3.2 medium Release: 2025-12-01	Mercury 2 Mercury 2 medium Release: 2026-02-24

Metric	DeepSeek V3.2 DeepSeek V3.2 medium Release: 2025-12-01	Mercury 2 Mercury 2 medium Release: 2026-02-24
Score	7.0	7.0
Rank	#75	#77
Reliability	10.0	10.0
Consistency	7.4	8.8
Tests Correct
Attempt pass rate	65.2%	51.5%
Flaky tests	7	3
Total Runs	66	66
Cost per result	0.671	0.928
Total Cost	$0.078	$0.093
Input Price	$0.269 / 1M	$0.250 / 1M
Output Price	$0.400 / 1M	$0.750 / 1M
Total Input Tokens	101,047	109,572
Output Tokens	11,834	10,313
Reasoning Tokens	117,014	76,806
Response Time (avg)	68.62s	2.72s
Response Time (max)	376.10s	14.63s
Response Time (total)	1509.53s	57.12s

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#75 DeepSeek V3.2

medium

Cost: $0.001
Time: 53.6s
Tokens: 1,932 tok

#77 Mercury 2

medium

Cost: $0.002
Time: 2.1s
Tokens: 1,702 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Category:

Anti-AI Tricks	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V3.2	8.2	7.9	83.3%	1		24.23s	448	3,247	6,953
Mercury 2	6.9	9.9	50.0%	0		1.12s	554	2,546	2,609

Coding	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V3.2	6.0	7.2	55.6%	1		248.68s	5,717	649	52,014
Mercury 2	8.2	7.7	77.8%	1		2.04s	7,065	296	11,328

Combined	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V3.2	7.3	5.8	83.3%	1		79.92s	76,997	5,219	24,229
Mercury 2	6.7	9.1	50.0%	0		7.84s	87,365	6,533	20,474

Data parsing and extraction	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V3.2	10.0	10.0	100.0%	0		36.09s	7,388	207	7,693
Mercury 2	7.3	5.9	83.3%	1		1.11s	6,234	183	1,656

Domain specific	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V3.2	2.9	4.4	22.2%	2		24.27s	472	21	6,838
Mercury 2	2.9	7.2	11.1%	1		6.48s	695	41	30,754

General Intelligence	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V3.2	3.4	2.5	33.3%	1		58.29s	314	49	2,189
Mercury 2	4.8	10.0	0.0%	0		821ms	456	137	542

Instructions following	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V3.2	10.0	10.0	100.0%	0		35.78s	627	1,397	2,845
Mercury 2	10.0	10.0	100.0%	0		1.07s	340	14	958

Puzzle Solving	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V3.2	7.0	7.2	55.6%	1		37.69s	594	518	6,375
Mercury 2	5.4	10.0	33.3%	0		949ms	601	361	2,781

Tool Calling	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V3.2	10.0	10.0	100.0%	0		34.81s	8,307	507	859
Mercury 2	10.0	10.0	100.0%	0		1.89s	6,080	180	1,956

Trivia	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V3.2	3.0	10.0	0.0%	0		83.99s	183	20	7,019
Mercury 2	3.0	10.0	0.0%	0		2.58s	182	22	3,748

Quick Compare

Switch Comparison Pair

Gemini 3.5 FlashnonevsMercury 2medium DeepSeek V3.2mediumvsGemini 3.5 Flashnone DeepSeek V4 PrononevsMercury 2medium Mercury 2mediumvsGPT-5.6 Solnone DeepSeek V3.2mediumvsGPT-5.6 Solnone Mercury 2mediumvsStep 3.7 Flashhigh Mercury 2mediumvsGPT-5.5none DeepSeek V3.2mediumvsQwen3.7 Plusnone DeepSeek V3.2mediumvsStep 3.7 Flashhigh DeepSeek V3.2mediumvsGPT-5.5none Mercury 2mediumvsQwen3.7 Plusnone Gemini 3.5 FlashminimalvsMercury 2medium