Kushindwa kwa kategoria za AI BENCHY
Mbinu za kupinga AI: Hitilafu ya API
Mbinu za kupinga AI
Hitilafu ya API
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hitilafu ya API katika Mbinu za kupinga AI, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↑.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hitilafu ya API | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #146 | Laguna Xs.2 none | Poolside | 1 | 3.0 | 0/4 | 534ms |
| #162 | Nemotron 3 Nano Omni 30b A3b Reasoning none | NVIDIA | 1 | 4.8 | 1/4 | 584ms |
| #145 | Laguna M.1 none | Poolside | 1 | 3.4 | 0/4 | 705ms |
| #149 | Nemotron 3 Nano Omni 30b A3b Reasoning medium | NVIDIA | 1 | 6.4 | 2/4 | 1.20s |
| #107 | Laguna Xs.2 medium | Poolside | 1 | 6.9 | 2/4 | 2.68s |
| #92 | Laguna M.1 medium | Poolside | 1 | 6.5 | 2/4 | 4.87s |
| #89 | Hy3 preview low | Tencent | 1 | 8.3 | 3/4 | 9.32s |
| #133 | DeepSeek V3.2 none | DeepSeek | 1 | 3.2 | 0/4 | 9.35s |
| #93 | Qwen3.6 Plus Preview medium | Qwen | 1 | 8.3 | 3/4 | 11.7s |
| #82 | Hy3 preview high | Tencent | 2 | 6.4 | 2/4 | 15.1s |
| #103 | DeepSeek V4 Pro high | DeepSeek | 1 | 6.4 | 2/4 | 16.5s |
| #72 | DeepSeek V3.2 medium | DeepSeek | 1 | 8.2 | 3/4 | 24.2s |