AI BENCHY Category
General Intelligence Ranking
See which AI models perform best on General Intelligence, which ones stay reliable, and where the biggest gaps appear. Sort by: Tests Correct ↑.
| Rank | Model | Company | General Intelligence Score | Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #130 | MiniMax M2.7 medium | Minimax | 3.9 | 5.3 | 0/1 | 38.7s |
| #131 | Qwen3.5-122B-A10B none | Qwen | 5.0 | 5.3 | 0/1 | 1.12s |
| #132 | Mistral Small 4 medium | Mistral | 4.8 | 5.3 | 0/1 | 2.05s |
| #133 | DeepSeek V3.2 none | DeepSeek | 4.7 | 5.2 | 0/1 | 9.32s |
| #134 | GLM 5 Turbo none | Z.ai | 4.2 | 5.2 | 0/1 | 2.18s |
| #136 | Elephant Alpha medium | Openrouter | 4.3 | 5.1 | 0/1 | 920ms |
| #137 | Elephant Alpha none | Openrouter | 4.0 | 5.1 | 0/1 | 854ms |
| #138 | Ling-2.6-flash none | Inclusionai | 4.0 | 5.0 | 0/1 | 1.45s |
| #139 | DeepSeek V4 Flash none | DeepSeek | 4.2 | 5.0 | 0/1 | 23.7s |
| #141 | Nemotron 3 Super none | NVIDIA | 4.6 | 4.9 | 0/1 | 950ms |
| #142 | Mistral Small 4 none | Mistral | 4.0 | 4.9 | 0/1 | 729ms |
| #143 | MiMo-V2.5 none | Xiaomi | 4.4 | 4.9 | 0/1 | 6.86s |
| #144 | GPT-5.4 Mini none | OpenAI | 4.8 | 4.9 | 0/1 | 1.82s |
| #145 | Laguna M.1 none | Poolside | 3.0 | 4.8 | 0/1 | 0ms |
| #146 | Laguna Xs.2 none | Poolside | 3.0 | 4.8 | 0/1 | 0ms |