Kushindwa kwa kategoria za AI BENCHY
Maarifa ya jumla: Hakuna jibu
Maarifa ya jumla
Hakuna jibu
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hakuna jibu katika Maarifa ya jumla, ili uone udhaifu haraka.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hakuna jibu | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #10 | Claude Opus 4.8 medium | Anthropic | 1 | 3.0 | 0/1 | 6.14s |
| #22 | Step 3.7 Flash medium | Stepfun | 1 | 3.0 | 0/1 | 114.0s |
| #57 | Step 3.7 Flash low | Stepfun | 1 | 3.0 | 0/1 | 124.8s |
| #67 | MiniMax M3 medium | Minimax | 1 | 3.0 | 0/1 | 100.8s |
| #68 | Claude Opus 4.8 none | Anthropic | 1 | 3.0 | 0/1 | 3.41s |
| #71 | Step 3.7 Flash high | Stepfun | 1 | 3.0 | 0/1 | 149.3s |