AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category

Puzzle Solving Ranking

See which AI models perform best on Puzzle Solving, which ones stay reliable, and where the biggest gaps appear. Sort by: Tests Correct ↑.

Models Shown

15

Average Puzzle Solving Score

6.7

Rank Model Company Puzzle Solving Score Score Tests Correct Response Time (avg)
#82 Hy3 preview high Tencent 7.7 6.6 2/3 27.9s
#88 Qwen3.7 Plus none Qwen 7.7 6.4 2/3 1.71s
#91 GPT-5.5 none OpenAI 7.7 6.4 2/3 1.29s
#95 Qwen3.5 Plus 2026-02-15 none Qwen 7.7 6.3 2/3 2.71s
#96 Ring-2.6-1T none Inclusionai 7.7 6.2 2/3 31.5s
#97 Gemini 2.5 Flash none Google 7.7 6.2 2/3 604ms
#98 GLM 5 none Z.ai 7.7 6.1 2/3 1.91s
#106 Grok 4.20 Beta none X AI 7.7 5.8 2/3 586ms
#112 GLM 5.1 none Z.ai 7.7 5.7 2/3 1.45s
#113 DeepSeek V4 Pro none DeepSeek 7.6 5.7 2/3 16.0s
#133 DeepSeek V3.2 none DeepSeek 7.6 5.2 2/3 6.91s
#1 Gemini 3 Flash Preview medium Google 10.0 9.8 3/3 4.05s
#2 Gemini 3.5 Flash high Google 10.0 9.6 3/3 3.23s
#3 Gemini 3.5 Flash low Google 10.0 9.4 3/3 2.35s
#4 Gemini 3.1 Pro Preview medium Google 10.0 9.4 3/3 6.90s

Top Models by Puzzle Solving Score

Puzzle Solving Score vs Total Cost

Top Models by Response Time (avg)