AI BENCHY Compare

DeepSeek: DeepSeek V4 Pro vs Qwen: Qwen3.5 Plus 2026-02-15

Summary

DeepSeek V4 Pro vs Qwen3.5 Plus 2026-02-15 benchmark comparison: DeepSeek V4 Pro leads on average score with 8.1 vs 8.0. DeepSeek V4 Pro has the lower benchmark cost at $0.098 vs $0.310. DeepSeek V4 Pro is faster at 72.22s vs 73.79s, with pass rates of 66.7% vs 73.0%.

Recommended model: DeepSeek V4 Pro - It has the best score here (8.1), while costing about 3.2x less than Qwen3.5 Plus 2026-02-15.

Last updated at: 2026-06-12

Metric	DeepSeek V4 Pro DeepSeek V4 Pro high Release: 2026-04-24	Qwen3.5 Plus 2026-02-15 Qwen3.5 Plus 2026-02-15 medium Release: 2026-02-15

Metric	DeepSeek V4 Pro DeepSeek V4 Pro high Release: 2026-04-24	Qwen3.5 Plus 2026-02-15 Qwen3.5 Plus 2026-02-15 medium Release: 2026-02-15
Score	8.1	8.0
Rank	#30	#32
Reliability	9.6	10.0
Consistency	7.8	8.8
Tests Correct
Attempt pass rate	66.7%	73.0%
Flaky tests	6	3
Total Runs	57	63
Cost per result	0.978	2.445
Total Cost	$0.098	$0.310
Input Price	$0.435 / 1M	$0.260 / 1M
Output Price	$0.870 / 1M	$1.560 / 1M
Total Input Tokens	35,122	40,918
Output Tokens	6,315	2,159
Reasoning Tokens	93,205	189,604
Response Time (avg)	72.22s	73.79s
Response Time (max)	437.44s	266.69s
Response Time (total)	1444.45s	1033.07s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#30 DeepSeek V4 Pro

high

Cost: $0.023
Time: 257.6s
Tokens: 14,870 tok

#32 Qwen3.5 Plus 2026-02-15

medium

Cost: $0.011
Time: 125.5s
Tokens: 7,040 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V4 Pro	5.7	5.9	58.3%	2		25.70s	536	149	3,214
Qwen3.5 Plus 2026-02-15	8.2	7.9	83.3%	1		45.78s	672	205	21,236

Coding	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V4 Pro	7.7	10.0	66.7%	0		308.19s	1,583	368	42,658
Qwen3.5 Plus 2026-02-15	6.6	7.1	44.4%	1		180.70s	6,950	420	80,595

Combined	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V4 Pro	10.0	10.0	100.0%	0		38.17s	14,060	454	5,836
Qwen3.5 Plus 2026-02-15	10.0	10.0	100.0%	0		46.85s	14,934	421	7,906

Data parsing and extraction	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V4 Pro	10.0	10.0	100.0%	0		25.03s	7,690	274	2,166
Qwen3.5 Plus 2026-02-15	10.0	10.0	100.0%	0		46.91s	7,782	270	14,916

Domain specific	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V4 Pro	3.6	7.2	22.2%	1		130.09s	472	4,400	26,367
Qwen3.5 Plus 2026-02-15	5.3	10.0	33.3%	0		17.50s	444	35	16,680

General Intelligence	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V4 Pro	10.0	10.0	100.0%	0		8.83s	471	115	1,013
Qwen3.5 Plus 2026-02-15	4.7	1.6	66.7%	1		79.86s	344	73	8,675

Instructions following	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V4 Pro	7.8	6.6	83.3%	1		8.73s	627	66	2,726
Qwen3.5 Plus 2026-02-15	10.0	10.0	100.0%	0		31.93s	699	101	7,704

Puzzle Solving	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V4 Pro	6.9	4.9	77.8%	2		56.85s	591	178	2,563
Qwen3.5 Plus 2026-02-15	10.0	10.0	100.0%	0		32.50s	696	301	13,853

Tool Calling	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V4 Pro	9.8	10.0	100.0%	0		15.92s	8,909	295	701
Qwen3.5 Plus 2026-02-15	10.0	10.0	100.0%	0		7.54s	8,193	309	909

Trivia	Score	Consistency	Attempt pass rate	Flaky tests	Tests Correct	Response Time (avg)	Input Tokens	Output Tokens	Reasoning Tokens
DeepSeek V4 Pro	3.0	10.0	0.0%	0		34.01s	183	16	5,961
Qwen3.5 Plus 2026-02-15	3.0	10.0	0.0%	0		103.81s	204	24	17,130

Quick Compare

Switch Comparison Pair