Kushindwa kwa kategoria za AI BENCHY
Mchanganyiko: Mwito wa zana si sahihi
Mchanganyiko
Mwito wa zana si sahihi
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Mwito wa zana si sahihi katika Mchanganyiko, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↓.
Sababu za kushindwa
Kategoria
| Nafasi | Modeli | Kampuni | Idadi ya Mwito wa zana si sahihi | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #64 | DeepSeek V3.2 none | DeepSeek | 1 | 6.5 | 0/1 | 115.9s |
| #93 | GLM 4.7 Flash medium | Z.ai | 1 | 2.8 | 0/1 | 65.6s |
| #71 | MiniMax M2.5 medium | Minimax | 1 | 4.5 | 0/1 | 60.4s |
| #80 | MiniMax M2.7 medium | Minimax | 1 | 4.7 | 0/1 | 41.0s |
| #75 | GLM 5.1 none | Z.ai | 1 | 2.8 | 0/1 | 32.6s |
| #31 | GLM 5V Turbo medium | Z.ai | 1 | 6.9 | 0/1 | 15.1s |
| #79 | Grok 4.20 Beta none | X AI | 1 | 3.0 | 0/1 | 6.48s |
| #82 | Grok 4.20 none | X AI | 1 | 3.0 | 0/1 | 6.04s |
| #90 | Qwen3.5-9B none | Qwen | 1 | 3.0 | 0/1 | 5.91s |
| #74 | GLM 4.7 Flash none | Z.ai | 1 | 3.0 | 0/1 | 3.22s |