Navigate
AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Compare

Google: Gemini 3.1 Flash Lite vs MoonshotAI: Kimi K2.5

Last updated at: 2026-05-08

Metric Gemini 3.1 Flash Lite Gemini 3.1 Flash Lite minimal Release: 2026-05-08 Kimi K2.5 Kimi K2.5 medium Release: 2026-01-27
Score 6.8 6.8
Rank #68 #69
Reliability 10.0 10.0
Consistency 8.7 7.0
Tests Correct
Attempt pass rate 59.7% 68.4%
Flaky tests 3 7
Total Runs 57 57
Cost per result 0.111 2.616
Total Cost $0.012 $0.236
Input Price $0.250 / 1M $0.440 / 1M
Output Price $1.500 / 1M $2.000 / 1M
Output Tokens 2,457 42,188
Reasoning Tokens 0 92,514
Response Time (avg) 1.41s 73.39s
Response Time (max) 4.49s 150.77s
Response Time (total) 26.72s 880.65s

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Category Breakdown

Anti-AI Tricks Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 8.3 10.0 75.0% 0 1.10s 639 0
Kimi K2.5 7.3 5.8 83.3% 2 51.38s 2,789 8,880
Coding Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 1.31s 636 0
Kimi K2.5 4.7 1.6 66.7% 1 150.77s 1,269 9,749
Combined Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 3.0 10.0 0.0% 0 2.53s 357 0
Kimi K2.5 10.0 10.0 100.0% 0 71.37s 703 3,713
Data parsing and extraction Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 1.04s 279 0
Kimi K2.5 10.0 10.0 100.0% 0 49.78s 563 7,940
Domain specific Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 2.9 7.2 11.1% 1 1.02s 15 0
Kimi K2.5 3.5 4.4 33.3% 2 137.29s 20,753 30,564
General Intelligence Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 4.0 10.0 0.0% 0 791ms 63 0
Kimi K2.5 6.5 3.4 66.7% 1 69.73s 3,815 4,262
Instructions following Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 932ms 72 0
Kimi K2.5 10.0 10.0 100.0% 0 92.47s 5,371 6,547
Puzzle Solving Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 6.0 4.6 66.7% 2 2.15s 153 0
Kimi K2.5 5.3 7.3 44.4% 1 45.40s 6,671 12,403
Tool Calling Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 10.0 10.0 100.0% 0 3.51s 234 0
Kimi K2.5 10.0 10.0 100.0% 0 31.74s 242 812
Trivia Score Consistency Attempt pass rate Flaky tests Tests Correct Response Time (avg) Output Tokens Reasoning Tokens
Gemini 3.1 Flash Lite 3.0 10.0 0.0% 0 724ms 9 0
Kimi K2.5 3.0 10.0 0.0% 0 83.95s 12 7,644

Quick Compare

Switch Comparison Pair