AI BENCHY
Advertise here

AI BENCHY Category

Combined Ranking

See which AI models perform best on Combined, which ones stay reliable, and where the biggest gaps appear. Sort by: Metric ↑.

Models Shown

15

Average Combined Score

6.3

Rank Model Company Combined Score Score Tests Correct Response Time (avg)
#160 LFM2-24B-A2B none Liquid 3.0 4.2 0/1 0ms
#161 Qwen3.5-9B medium Qwen 3.0 4.2 0/1 0ms
#162 Nemotron 3 Nano Omni 30b A3b Reasoning none NVIDIA 3.0 4.1 0/1 0ms
#163 Granite 4.1 8B none IBM Granite 3.0 4.0 0/1 1.88s
#129 MiniMax M2.5 medium Minimax 4.5 5.3 0/1 60.4s
#139 DeepSeek V4 Flash none DeepSeek 4.5 5.0 0/1 112.0s
#48 Gemini 3 Flash Preview none Google 4.7 7.4 0/1 3.56s
#51 Mimo V2 PRO medium Xiaomi 4.7 7.4 0/1 64.7s
#66 Qwen3.5-35B-A3B medium Qwen 4.7 7.1 0/1 75.3s
#79 Hunter Alpha medium OpenRouter 4.7 6.7 0/1 30.5s
#130 MiniMax M2.7 medium Minimax 4.7 5.3 0/1 41.0s
#133 DeepSeek V3.2 none DeepSeek 6.5 5.2 0/1 115.9s
#59 GLM 5V Turbo medium Z.ai 6.9 7.2 0/1 15.1s
#78 Qwen3.6 27B medium Qwen 7.0 6.8 0/1 83.1s
#4 Gemini 3.1 Pro Preview medium Google 9.5 9.4 1/1 40.6s

Top Models by Combined Score

Combined Score vs Total Cost

Top Models by Response Time (avg)