AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category

Puzzle Solving Ranking

See which AI models perform best on Puzzle Solving, which ones stay reliable, and where the biggest gaps appear.

Models Shown

15

Average Puzzle Solving Score

6.7

Rank Model Company Puzzle Solving Score Score Tests Correct Response Time (avg)
#120 Mimo V2 PRO none Xiaomi 6.0 5.6 1/3 1.61s
#87 Gemini 3.1 Flash Lite minimal Google 6.0 6.4 1/3 2.15s
#38 Grok 4.3 medium X AI 5.9 7.6 1/3 22.5s
#75 Ring-2.6-1T medium Inclusionai 5.9 6.9 1/3 20.7s
#80 Mimo V2 Omni medium Xiaomi 5.9 6.7 1/3 2.38s
#103 DeepSeek V4 Pro high DeepSeek 5.9 6.0 1/3 34.8s
#104 Nemotron 3 Ultra 550b A55b none NVIDIA 5.9 6.0 1/3 1.06s
#130 MiniMax M2.7 medium Minimax 5.9 5.3 1/3 24.9s
#116 Hunter Alpha none OpenRouter 5.8 5.7 1/3 3.71s
#53 Gemini 3.1 Flash Lite high Google 5.7 7.3 1/3 50.8s
#22 Step 3.7 Flash medium Stepfun 5.7 8.0 1/3 6.19s
#54 GPT-5 Mini medium OpenAI 5.6 7.3 1/3 15.2s
#125 GPT-5.4 none OpenAI 5.6 5.5 1/3 1.44s
#57 Step 3.7 Flash low Stepfun 5.5 7.3 1/3 1.84s
#141 Nemotron 3 Super none NVIDIA 5.5 4.9 1/3 2.36s

Top Models by Puzzle Solving Score

Puzzle Solving Score vs Total Cost

Top Models by Response Time (avg)