Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

DeepSeek: DeepSeek V3.2 vs MoonshotAI: Kimi K2.5

Last updated at: 2026-04-16

Metric DeepSeek V3.2 DeepSeek V3.2 none Release: 2025-12-01 Kimi K2.5 Kimi K2.5 medium Release: 2026-01-27
Score 6.1 7.0
Rank #63 #45
Consistency 8.1 6.8
Tests Correct
Attempt pass rate 50.0% 72.2%
Flaky tests 4 7
Total Runs 54 54
Cost per result 0.226 2.444
Total Cost $0.016 $0.220
Input Price $0.260 / 1M $0.383 / 1M
Output Price $0.380 / 1M $1.720 / 1M
Output Tokens 8,384 42,176
Reasoning Tokens 0 84,870
Response Time (avg) 12.09s 72.43s
Response Time (max) 115.89s 150.77s
Response Time (total) 217.56s 796.70s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 3.2 9.8 0.0% 0 7.63s 1,419 0
Kimi K2.5 7.3 5.8 83.3% 2 51.38s 2,789 8,880
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 2.4 1.3 33.3% 1 7.63s 553 0
Kimi K2.5 4.7 1.6 66.7% 1 150.77s 1,269 9,749
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 6.5 10.0 0.0% 0 115.89s 2,887 0
Kimi K2.5 10.0 10.0 100.0% 0 71.37s 703 3,713
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 6.3 5.8 66.7% 1 9.42s 1,710 0
Kimi K2.5 10.0 10.0 100.0% 0 49.78s 563 7,940
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 3.6 7.2 22.2% 1 1.61s 24 0
Kimi K2.5 3.5 4.4 33.3% 2 137.29s 20,753 30,564
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 2.86s 67 0
Kimi K2.5 6.5 3.4 66.7% 1 69.73s 3,815 4,262
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 1.52s 66 0
Kimi K2.5 10.0 10.0 100.0% 0 92.47s 5,371 6,547
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 8.5 7.5 88.9% 1 7.37s 1,136 0
Kimi K2.5 5.3 7.3 44.4% 1 45.40s 6,671 12,403
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
DeepSeek V3.2 10.0 10.0 100.0% 0 11.85s 522 0
Kimi K2.5 10.0 10.0 100.0% 0 31.74s 242 812

Quick Compare

Switch Comparison Pair