AI BENCHY Category Failures
Data parsing and extraction: API error
Data parsing and extraction
API error
See which AI models are most likely to hit API error on Data parsing and extraction, so you can spot weak points faster.
Failure Reasons
| Rank | Model | Company | API error Count | Category Score | Tests Correct | Response Time (avg) |
|---|---|---|---|---|---|---|
| #20 | Gemini 3.5 Flash none | 1 | 6.5 | 1/2 | 8.10s | |
| #33 | Hy3 preview medium | Tencent | 1 | 6.5 | 1/2 | 5.25s |
| #49 | Qwen3.5-Flash medium | Qwen | 1 | 7.3 | 1/2 | 57.0s |
| #64 | MiMo-V2-Flash medium | Xiaomi | 1 | 6.5 | 1/2 | 0ms |
| #66 | Qwen3.5-35B-A3B medium | Qwen | 1 | 7.3 | 1/2 | 59.3s |
| #82 | Hy3 preview high | Tencent | 1 | 6.5 | 1/2 | 12.1s |
| #83 | Step 3.5 Flash none | Stepfun | 1 | 3.0 | 0/1 | 0ms |
| #89 | Hy3 preview low | Tencent | 1 | 6.5 | 1/2 | 5.85s |
| #96 | Ring-2.6-1T none | Inclusionai | 1 | 3.0 | 0/2 | 45.9s |
| #100 | Grok Build 0.1 none | X AI | 1 | 3.8 | 0/2 | 9.33s |
| #103 | DeepSeek V4 Pro high | DeepSeek | 1 | 7.3 | 1/2 | 23.6s |
| #113 | DeepSeek V4 Pro none | DeepSeek | 1 | 6.9 | 1/2 | 30.5s |
| #126 | gpt-oss-120b none | OpenAI | 1 | 6.5 | 1/2 | 7.12s |
| #132 | Mistral Small 4 medium | Mistral | 1 | 7.3 | 1/2 | 1.23s |
| #152 | MiMo-V2-Flash none | Xiaomi | 1 | 2.9 | 0/2 | 19.7s |