Kegagalan kategori AI BENCHY
Parsing dan ekstraksi data: Kesalahan API
Parsing dan ekstraksi data
Kesalahan API
Lihat model AI mana yang paling mungkin mengalami Kesalahan API di Parsing dan ekstraksi data, agar Anda bisa menemukan titik lemahnya lebih cepat. Urutkan berdasarkan: Waktu respons (rata-rata) ↑.
Alasan kegagalan
| Peringkat | Model | Perusahaan | Jumlah Kesalahan API | Skor kategori | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #64 | MiMo-V2-Flash medium | Xiaomi | 1 | 6.5 | 1/2 | 0ms |
| #83 | Step 3.5 Flash none | Stepfun | 1 | 3.0 | 0/1 | 0ms |
| #132 | Mistral Small 4 medium | Mistral | 1 | 7.3 | 1/2 | 1.23s |
| #156 | Hy3 preview none | Tencent | 1 | 6.5 | 1/2 | 2.85s |
| #33 | Hy3 preview medium | Tencent | 1 | 6.5 | 1/2 | 5.25s |
| #89 | Hy3 preview low | Tencent | 1 | 6.5 | 1/2 | 5.85s |
| #126 | gpt-oss-120b none | OpenAI | 1 | 6.5 | 1/2 | 7.12s |
| #20 | Gemini 3.5 Flash none | 1 | 6.5 | 1/2 | 8.10s | |
| #100 | Grok Build 0.1 none | X AI | 1 | 3.8 | 0/2 | 9.33s |
| #82 | Hy3 preview high | Tencent | 1 | 6.5 | 1/2 | 12.1s |
| #152 | MiMo-V2-Flash none | Xiaomi | 1 | 2.9 | 0/2 | 19.7s |
| #103 | DeepSeek V4 Pro high | DeepSeek | 1 | 7.3 | 1/2 | 23.6s |
| #113 | DeepSeek V4 Pro none | DeepSeek | 1 | 6.9 | 1/2 | 30.5s |
| #96 | Ring-2.6-1T none | Inclusionai | 1 | 3.0 | 0/2 | 45.9s |
| #49 | Qwen3.5-Flash medium | Qwen | 1 | 7.3 | 1/2 | 57.0s |