Kushindwa kwa kategoria za AI BENCHY
Maarifa ya jumla: Hakuna jibu
Maarifa ya jumla
Hakuna jibu
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hakuna jibu katika Maarifa ya jumla, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↑.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hakuna jibu | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #68 | Claude Opus 4.8 none | Anthropic | 1 | 3.0 | 0/1 | 3.41s |
| #10 | Claude Opus 4.8 medium | Anthropic | 1 | 3.0 | 0/1 | 6.14s |
| #67 | MiniMax M3 medium | Minimax | 1 | 3.0 | 0/1 | 100.8s |
| #22 | Step 3.7 Flash medium | Stepfun | 1 | 3.0 | 0/1 | 114.0s |
| #57 | Step 3.7 Flash low | Stepfun | 1 | 3.0 | 0/1 | 124.8s |
| #71 | Step 3.7 Flash high | Stepfun | 1 | 3.0 | 0/1 | 149.3s |