Kategoria ya AI BENCHY
Orodha ya Mbinu za kupinga AI
Ona ni modeli gani za AI zinafanya vizuri zaidi katika Mbinu za kupinga AI, zipi zinabaki thabiti, na pengo kubwa liko wapi. Panga kwa: Majaribio sahihi ↑.
| Nafasi | Modeli | Kampuni | Alama ya Mbinu za kupinga AI | Alama | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #70 | Qwen3.5-122B-A10B none | Qwen | 4.8 | 5.7 | 1/4 | 1.59s |
| #73 | Mistral Small 4 medium | Mistral | 5.6 | 5.7 | 1/4 | 2.67s |
| #74 | GLM 4.7 Flash none | Z.ai | 5.2 | 5.6 | 1/4 | 5.51s |
| #82 | Grok 4.20 none | X AI | 4.8 | 5.2 | 1/4 | 501ms |
| #88 | Nemotron 3 Super none | NVIDIA | 4.8 | 5.1 | 1/4 | 7.43s |
| #89 | GPT-4o-mini none | OpenAI | 4.8 | 4.9 | 1/4 | 1.34s |
| #93 | GLM 4.7 Flash medium | Z.ai | 4.7 | 4.6 | 1/4 | 15.0s |
| #97 | Qwen3.5-9B medium | Qwen | 5.1 | 4.4 | 1/4 | 34.4s |
| #26 | Claude Sonnet 4.6 medium | Anthropic | 6.5 | 8.0 | 2/4 | 2.98s |
| #29 | Gemini 3.1 Flash Lite Preview none | 7.5 | 7.9 | 2/4 | 1.04s | |
| #31 | GLM 5V Turbo medium | Z.ai | 7.2 | 7.8 | 2/4 | 10.8s |
| #34 | Kimi K2.6 medium | Moonshot AI | 7.0 | 7.7 | 2/4 | 11.6s |
| #36 | GPT-5.3 Chat none | OpenAI | 6.7 | 7.7 | 2/4 | 3.86s |
| #37 | Claude Opus 4.6 medium | Anthropic | 6.4 | 7.6 | 2/4 | 7.45s |
| #39 | Seed-2.0-Mini medium | Bytedance Seed | 6.6 | 7.5 | 2/4 | 74.7s |