Kategoria ya AI BENCHY
Orodha ya Mbinu za kupinga AI
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Mbinu za kupinga AI, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Majaribio sahihi ↑.
| Nafasi | Modeli | Kampuni | Alama ya Mbinu za kupinga AI | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #40 | GPT-5.2 medium | OpenAI | 6.5 | 7.5 | 2/4 | 7.81s |
| #45 | GPT-5 Mini medium | OpenAI | 7.1 | 7.0 | 2/4 | 13.9s |
| #46 | Kimi K2.5 medium | Moonshot AI | 7.3 | 7.0 | 2/4 | 51.4s |
| #48 | Gemma 4 31B none | 6.5 | 6.9 | 2/4 | 1.85s | |
| #50 | Hunter Alpha medium | OpenRouter | 7.3 | 6.7 | 2/4 | 4.75s |
| #54 | Mercury 2 medium | Inception | 6.9 | 6.5 | 2/4 | 1.12s |
| #56 | Grok 4.20 Multi Agent Beta medium | X AI | 6.9 | 6.4 | 2/4 | 3.46s |
| #57 | GPT-5 Nano medium | OpenAI | 6.5 | 6.3 | 2/4 | 25.5s |
| #68 | gpt-oss-120b medium | OpenAI | 6.7 | 5.8 | 2/4 | 10.2s |
| #71 | MiniMax M2.5 medium | Minimax | 7.9 | 5.7 | 2/4 | 20.8s |
| #80 | MiniMax M2.7 medium | Minimax | 7.9 | 5.3 | 2/4 | 40.3s |
| #81 | Elephant medium | Openrouter | 6.6 | 5.2 | 2/4 | 1.19s |
| #84 | gpt-oss-120b none | OpenAI | 6.6 | 5.2 | 2/4 | 6.03s |
| #85 | Elephant none | Openrouter | 6.6 | 5.2 | 2/4 | 963ms |
| #3 | Claude Opus 4.7 medium | Anthropic | 8.3 | 9.2 | 3/4 | 1.85s |