AI BENCHY
Advertise here

AI BENCHY Category Failures

Data parsing and extraction: API error

Data parsing and extraction
API error

See which AI models are most likely to hit API error on Data parsing and extraction, so you can spot weak points faster. Sort by: Response Time (avg) ↓.

Models Shown

15

Total Failures

16

Most Affected Model

Qwen3.5-35B-A3B 1
Rank Model Company API error Count Category Score Tests Correct Response Time (avg)
#66 Qwen3.5-35B-A3B medium Qwen 1 7.3 1/2 59.3s
#49 Qwen3.5-Flash medium Qwen 1 7.3 1/2 57.0s
#96 Ring-2.6-1T none Inclusionai 1 3.0 0/2 45.9s
#113 DeepSeek V4 Pro none DeepSeek 1 6.9 1/2 30.5s
#103 DeepSeek V4 Pro high DeepSeek 1 7.3 1/2 23.6s
#152 MiMo-V2-Flash none Xiaomi 1 2.9 0/2 19.7s
#82 Hy3 preview high Tencent 1 6.5 1/2 12.1s
#100 Grok Build 0.1 none X AI 1 3.8 0/2 9.33s
#20 Gemini 3.5 Flash none Google 1 6.5 1/2 8.10s
#126 gpt-oss-120b none OpenAI 1 6.5 1/2 7.12s
#89 Hy3 preview low Tencent 1 6.5 1/2 5.85s
#33 Hy3 preview medium Tencent 1 6.5 1/2 5.25s
#156 Hy3 preview none Tencent 1 6.5 1/2 2.85s
#132 Mistral Small 4 medium Mistral 1 7.3 1/2 1.23s
#64 MiMo-V2-Flash medium Xiaomi 1 6.5 1/2 0ms

Top Models by API error Count

API error Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost