AI BENCHY Category Failures
Data parsing and extraction: API error
Data parsing and extraction
API error
See which AI models are most likely to hit API error on Data parsing and extraction, so you can spot weak points faster. Sort by: Response Time (avg) ↓.
Failure Reasons
| Rank | Model | Company | API error Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #66 | Qwen3.5-35B-A3B medium | Qwen | 1 | 7.3 | 1/2 | 59.3s |
| #49 | Qwen3.5-Flash medium | Qwen | 1 | 7.3 | 1/2 | 57.0s |
| #96 | Ring-2.6-1T none | Inclusionai | 1 | 3.0 | 0/2 | 45.9s |
| #113 | DeepSeek V4 Pro none | DeepSeek | 1 | 6.9 | 1/2 | 30.5s |
| #103 | DeepSeek V4 Pro high | DeepSeek | 1 | 7.3 | 1/2 | 23.6s |
| #152 | MiMo-V2-Flash none | Xiaomi | 1 | 2.9 | 0/2 | 19.7s |
| #82 | Hy3 preview high | Tencent | 1 | 6.5 | 1/2 | 12.1s |
| #100 | Grok Build 0.1 none | X AI | 1 | 3.8 | 0/2 | 9.33s |
| #20 | Gemini 3.5 Flash none | 1 | 6.5 | 1/2 | 8.10s | |
| #126 | gpt-oss-120b none | OpenAI | 1 | 6.5 | 1/2 | 7.12s |
| #89 | Hy3 preview low | Tencent | 1 | 6.5 | 1/2 | 5.85s |
| #33 | Hy3 preview medium | Tencent | 1 | 6.5 | 1/2 | 5.25s |
| #156 | Hy3 preview none | Tencent | 1 | 6.5 | 1/2 | 2.85s |
| #132 | Mistral Small 4 medium | Mistral | 1 | 7.3 | 1/2 | 1.23s |
| #64 | MiMo-V2-Flash medium | Xiaomi | 1 | 6.5 | 1/2 | 0ms |