Kushindwa kwa kategoria za AI BENCHY
Mbinu za kupinga AI: Hitilafu ya API
Mbinu za kupinga AI
Hitilafu ya API
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hitilafu ya API katika Mbinu za kupinga AI, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↓.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hitilafu ya API | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #72 | DeepSeek V3.2 medium | DeepSeek | 1 | 8.2 | 3/4 | 24.2s |
| #103 | DeepSeek V4 Pro high | DeepSeek | 1 | 6.4 | 2/4 | 16.5s |
| #82 | Hy3 preview high | Tencent | 2 | 6.4 | 2/4 | 15.1s |
| #93 | Qwen3.6 Plus Preview medium | Qwen | 1 | 8.3 | 3/4 | 11.7s |
| #133 | DeepSeek V3.2 none | DeepSeek | 1 | 3.2 | 0/4 | 9.35s |
| #89 | Hy3 preview low | Tencent | 1 | 8.3 | 3/4 | 9.32s |
| #92 | Laguna M.1 medium | Poolside | 1 | 6.5 | 2/4 | 4.87s |
| #107 | Laguna Xs.2 medium | Poolside | 1 | 6.9 | 2/4 | 2.68s |
| #149 | Nemotron 3 Nano Omni 30b A3b Reasoning medium | NVIDIA | 1 | 6.4 | 2/4 | 1.20s |
| #145 | Laguna M.1 none | Poolside | 1 | 3.4 | 0/4 | 705ms |
| #162 | Nemotron 3 Nano Omni 30b A3b Reasoning none | NVIDIA | 1 | 4.8 | 1/4 | 584ms |
| #146 | Laguna Xs.2 none | Poolside | 1 | 3.0 | 0/4 | 534ms |