Navigate
AI BENCHY
Advertise here

AI BENCHY Compare

DeepSeek: DeepSeek V4 Pro vs MoonshotAI: Kimi K2.6

Summary

DeepSeek V4 Pro vs Kimi K2.6 benchmark comparison: DeepSeek V4 Pro leads on average score with 8.1 vs 7.8. DeepSeek V4 Pro has the lower benchmark cost at $0.098 vs $0.889. Kimi K2.6 is faster at 71.67s vs 72.22s, with pass rates of 66.7% vs 65.1%.

Recommended model: DeepSeek V4 Pro - It has the best score here (8.1), while costing about 9.1x less than Kimi K2.6.

Last updated at: 2026-06-12

Metric DeepSeek V4 Pro DeepSeek V4 Pro high Release: 2026-04-24 Kimi K2.6 Kimi K2.6 medium Release: 2026-04-20 Free Available
Score 8.1 7.8
Rank #30 #39
Reliability 9.6 10.0
Consistency 7.8 8.6
Tests Correct
Attempt pass rate 66.7% 65.1%
Flaky tests 6 3
Total Runs 57 63
Cost per result 0.978 8.358
Total Cost $0.098 $0.889
Input Price $0.435 / 1M $0.680 / 1M
Output Price $0.870 / 1M $3.410 / 1M
Total Input Tokens 35,122 29,450
Output Tokens 6,315 102,923
Reasoning Tokens 93,205 254,094
Response Time (avg) 72.22s 71.67s
Response Time (max) 437.44s 406.78s
Response Time (total) 1444.45s 1433.36s

Generation showcase

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#30 DeepSeek V4 Pro

high
Cost
$0.023
Time
257.6s
Tokens
14,870 tok

#39 MoonshotAI: Kimi K2.6

medium
Cost
$0.013
Time
103.4s
Tokens
3,620 tok

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 5.7 5.9 58.3% 2 25.70s 536 149 3,214
Kimi K2.6 7.0 8.0 66.7% 1 11.59s 618 7,115 8,934
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 7.7 10.0 66.7% 0 308.19s 1,583 368 42,658
Kimi K2.6 5.7 8.6 33.3% 0 214.42s 2,925 9,970 77,189
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 10.0 10.0 100.0% 0 38.17s 14,060 454 5,836
Kimi K2.6 10.0 10.0 100.0% 0 40.96s 11,271 711 13,876
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 10.0 10.0 100.0% 0 25.03s 7,690 274 2,166
Kimi K2.6 10.0 10.0 100.0% 0 20.38s 7,014 316 11,305
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 3.6 7.2 22.2% 1 130.09s 472 4,400 26,367
Kimi K2.6 5.3 7.2 44.4% 1 202.38s 326 47,035 98,262
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 10.0 10.0 100.0% 0 8.83s 471 115 1,013
Kimi K2.6 10.0 10.0 100.0% 0 17.83s 477 3,981 4,472
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 7.8 6.6 83.3% 1 8.73s 627 66 2,726
Kimi K2.6 10.0 10.0 100.0% 0 12.53s 669 3,977 5,269
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 6.9 4.9 77.8% 2 56.85s 591 178 2,563
Kimi K2.6 6.0 7.4 55.6% 1 25.06s 651 13,860 17,599
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 9.8 10.0 100.0% 0 15.92s 8,909 295 701
Kimi K2.6 10.0 10.0 100.0% 0 8.92s 5,286 248 1,011
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Input Tokens Output Tokens Reasoning Tokens
DeepSeek V4 Pro 3.0 10.0 0.0% 0 34.01s 183 16 5,961
Kimi K2.6 3.0 10.0 0.0% 0 130.27s 213 15,710 16,177

Quick Compare

Switch Comparison Pair