Kushindwa kwa kategoria za AI BENCHY
Uchanganuzi na uchimbaji wa data: Hitilafu ya API
Uchanganuzi na uchimbaji wa data
Hitilafu ya API
Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Hitilafu ya API katika Uchanganuzi na uchimbaji wa data, ili uone udhaifu haraka. Panga kwa: Majaribio sahihi ↓.
Sababu za kushindwa
| Nafasi | Modeli | Kampuni | Idadi ya Hitilafu ya API | Alama ya kategoria | Majaribio sahihi | Muda wa majibu (wastani) |
|---|---|---|---|---|---|---|
| #20 | Gemini 3.5 Flash none | 1 | 6.5 | 1/2 | 8.10s | |
| #33 | Hy3 preview medium | Tencent | 1 | 6.5 | 1/2 | 5.25s |
| #49 | Qwen3.5-Flash medium | Qwen | 1 | 7.3 | 1/2 | 57.0s |
| #64 | MiMo-V2-Flash medium | Xiaomi | 1 | 6.5 | 1/2 | 0ms |
| #66 | Qwen3.5-35B-A3B medium | Qwen | 1 | 7.3 | 1/2 | 59.3s |
| #82 | Hy3 preview high | Tencent | 1 | 6.5 | 1/2 | 12.1s |
| #89 | Hy3 preview low | Tencent | 1 | 6.5 | 1/2 | 5.85s |
| #103 | DeepSeek V4 Pro high | DeepSeek | 1 | 7.3 | 1/2 | 23.6s |
| #113 | DeepSeek V4 Pro none | DeepSeek | 1 | 6.9 | 1/2 | 30.5s |
| #126 | gpt-oss-120b none | OpenAI | 1 | 6.5 | 1/2 | 7.12s |
| #132 | Mistral Small 4 medium | Mistral | 1 | 7.3 | 1/2 | 1.23s |
| #156 | Hy3 preview none | Tencent | 1 | 6.5 | 1/2 | 2.85s |
| #83 | Step 3.5 Flash none | Stepfun | 1 | 3.0 | 0/1 | 0ms |
| #96 | Ring-2.6-1T none | Inclusionai | 1 | 3.0 | 0/2 | 45.9s |
| #100 | Grok Build 0.1 none | X AI | 1 | 3.8 | 0/2 | 9.33s |