AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category

Puzzle Solving Ranking

See which AI models perform best on Puzzle Solving, which ones stay reliable, and where the biggest gaps appear.

Models Shown

15

Average Puzzle Solving Score

6.4

Rank Model Company Puzzle Solving Score Score Tests Correct Response Time (avg)
#15 Gemini 2.5 Flash medium Google 7.7 8.2 2/3 3.94s
#17 Gemini 3.1 Flash Lite Preview medium Google 7.7 8.2 2/3 3.58s
#21 Gemini 3 Flash Preview none Google 7.7 8.1 2/3 1.06s
#28 GPT-5.2 Chat none OpenAI 7.7 7.9 2/3 4.42s
#37 Claude Opus 4.6 medium Anthropic 7.7 7.6 2/3 4.60s
#41 MiMo-V2-Flash medium Xiaomi 7.7 7.5 2/3 3.77s
#42 Claude Sonnet 4.6 none Anthropic 7.7 7.4 2/3 2.92s
#49 Qwen3.5 Plus 2026-02-15 none Qwen 7.7 6.8 2/3 2.82s
#53 GLM 5 none Z.ai 7.7 6.6 2/3 2.05s
#18 GLM 5 Turbo medium Z.ai 7.3 8.1 1/3 5.44s
#56 Grok 4.20 Multi Agent Beta medium X AI 7.2 6.4 1/3 5.01s
#23 MiMo-V2-Pro medium Xiaomi 7.0 8.1 1/3 4.71s
#44 GPT-5.4 Mini medium OpenAI 6.8 7.3 1/3 4.33s
#67 Qwen3.5-27B none Qwen 6.7 5.9 1/3 1.37s
#35 MiMo-V2-Omni medium Xiaomi 6.5 7.7 1/3 3.88s

Top Models by Puzzle Solving Score

Puzzle Solving Score vs Total Cost

Top Models by Response Time (avg)