AI BENCHY
Advertise here

AI BENCHY Category

Puzzle Solving Ranking

See which AI models perform best on Puzzle Solving, which ones stay reliable, and where the biggest gaps appear.

Models Shown

15

Average Puzzle Solving Score

6.7

Rank Model Company Puzzle Solving Score Score Tests Correct Response Time (avg)
#67 MiniMax M3 medium Minimax 7.9 7.1 2/3 49.9s
#45 GPT-5.4 Mini medium OpenAI 7.8 7.5 2/3 4.37s
#7 Gemini 3.5 Flash medium Google 7.7 9.0 2/3 2.38s
#12 Gemini 3.1 Flash Lite Preview high Google 7.7 8.6 2/3 46.7s
#24 GPT-5.2 Chat none OpenAI 7.7 7.9 2/3 4.10s
#28 Gemini 2.5 Flash medium Google 7.7 7.8 2/3 3.18s
#33 Hy3 preview medium Tencent 7.7 7.7 2/3 11.1s
#40 Gemini 3.1 Flash Lite Preview medium Google 7.7 7.5 2/3 5.30s
#47 Grok Build 0.1 medium X AI 7.7 7.4 2/3 18.3s
#48 Gemini 3 Flash Preview none Google 7.7 7.4 2/3 1.05s
#59 GLM 5V Turbo medium Z.ai 7.7 7.2 2/3 10.2s
#64 MiMo-V2-Flash medium Xiaomi 7.7 7.2 2/3 3.87s
#65 Grok 4.20 medium X AI 7.7 7.1 2/3 6.22s
#68 Claude Opus 4.8 none Anthropic 7.7 7.0 2/3 2.74s
#69 Claude Opus 4.6 medium Anthropic 7.7 7.0 2/3 4.71s

Top Models by Puzzle Solving Score

Puzzle Solving Score vs Total Cost

Top Models by Response Time (avg)