AI BENCHY
Your ad here

AI BENCHY Category

Puzzle Solving Ranking

See which AI models perform best on Puzzle Solving, which ones stay reliable, and where the biggest gaps appear. Sort by: Metric ↑.

Models Shown

15

Average Puzzle Solving Score

6.4

Rank Model Company Puzzle Solving Score Score Tests Correct Response Time (avg)
#52 Grok 4.1 Fast medium X AI 5.3 6.7 1/3 8.08s
#57 GPT-5 Nano medium OpenAI 5.3 6.3 1/3 19.8s
#30 Step 3.5 Flash medium Stepfun 5.3 7.9 1/3 7.72s
#58 GLM 5V Turbo none Z.ai 5.3 6.2 1/3 2.22s
#46 Kimi K2.5 medium Moonshot AI 5.3 7.0 1/3 45.4s
#78 Trinity Large Preview none Arcee AI 5.4 5.3 1/3 3.30s
#70 Qwen3.5-122B-A10B none Qwen 5.4 5.7 1/3 982ms
#86 GPT-5.4 Mini none OpenAI 5.4 5.1 1/3 860ms
#77 GLM 5 Turbo none Z.ai 5.5 5.5 1/3 2.43s
#48 Gemma 4 31B none Google 5.5 6.9 1/3 2.95s
#66 GPT-5.4 none OpenAI 5.6 5.9 1/3 1.52s
#45 GPT-5 Mini medium OpenAI 5.6 7.0 1/3 14.1s
#60 Gemma 4 26B A4B none Google 5.7 6.2 1/3 739ms
#62 Gemini 2.5 Flash none Google 5.7 6.2 1/3 576ms
#75 GLM 5.1 none Z.ai 5.7 5.6 1/3 1.48s

Top Models by Puzzle Solving Score

Puzzle Solving Score vs Total Cost

Top Models by Response Time (avg)