Kushindwa kwa kategoria za AI BENCHY
Uchanganuzi na uchimbaji wa data: Hitilafu ya API
Uchanganuzi na uchimbaji wa data
Hitilafu ya API
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hitilafu ya API katika Uchanganuzi na uchimbaji wa data, ili uone udhaifu haraka. Panga kwa: Muda wa majibu (wastani) ↑.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hitilafu ya API | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #64 | MiMo-V2-Flash medium | Xiaomi | 1 | 6.5 | 1/2 | 0ms |
| #83 | Step 3.5 Flash none | Stepfun | 1 | 3.0 | 0/1 | 0ms |
| #132 | Mistral Small 4 medium | Mistral | 1 | 7.3 | 1/2 | 1.23s |
| #156 | Hy3 preview none | Tencent | 1 | 6.5 | 1/2 | 2.85s |
| #33 | Hy3 preview medium | Tencent | 1 | 6.5 | 1/2 | 5.25s |
| #89 | Hy3 preview low | Tencent | 1 | 6.5 | 1/2 | 5.85s |
| #126 | gpt-oss-120b none | OpenAI | 1 | 6.5 | 1/2 | 7.12s |
| #20 | Gemini 3.5 Flash none | 1 | 6.5 | 1/2 | 8.10s | |
| #100 | Grok Build 0.1 none | X AI | 1 | 3.8 | 0/2 | 9.33s |
| #82 | Hy3 preview high | Tencent | 1 | 6.5 | 1/2 | 12.1s |
| #152 | MiMo-V2-Flash none | Xiaomi | 1 | 2.9 | 0/2 | 19.7s |
| #103 | DeepSeek V4 Pro high | DeepSeek | 1 | 7.3 | 1/2 | 23.6s |
| #113 | DeepSeek V4 Pro none | DeepSeek | 1 | 6.9 | 1/2 | 30.5s |
| #96 | Ring-2.6-1T none | Inclusionai | 1 | 3.0 | 0/2 | 45.9s |
| #49 | Qwen3.5-Flash medium | Qwen | 1 | 7.3 | 1/2 | 57.0s |