AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category Failures

Data parsing and extraction: API error

Data parsing and extraction
API error

See which AI models are most likely to hit API error on Data parsing and extraction, so you can spot weak points faster.

Models Shown

15

Total Failures

16

Most Affected Model

Gemini 3.5 Flash 1
Rank Model Company API error Count Category Score Tests Correct Response Time (avg)
#20 Gemini 3.5 Flash none Google 1 6.5 1/2 8.10s
#33 Hy3 preview medium Tencent 1 6.5 1/2 5.25s
#49 Qwen3.5-Flash medium Qwen 1 7.3 1/2 57.0s
#64 MiMo-V2-Flash medium Xiaomi 1 6.5 1/2 0ms
#66 Qwen3.5-35B-A3B medium Qwen 1 7.3 1/2 59.3s
#82 Hy3 preview high Tencent 1 6.5 1/2 12.1s
#83 Step 3.5 Flash none Stepfun 1 3.0 0/1 0ms
#89 Hy3 preview low Tencent 1 6.5 1/2 5.85s
#96 Ring-2.6-1T none Inclusionai 1 3.0 0/2 45.9s
#100 Grok Build 0.1 none X AI 1 3.8 0/2 9.33s
#103 DeepSeek V4 Pro high DeepSeek 1 7.3 1/2 23.6s
#113 DeepSeek V4 Pro none DeepSeek 1 6.9 1/2 30.5s
#126 gpt-oss-120b none OpenAI 1 6.5 1/2 7.12s
#132 Mistral Small 4 medium Mistral 1 7.3 1/2 1.23s
#152 MiMo-V2-Flash none Xiaomi 1 2.9 0/2 19.7s

Top Models by API error Count

API error Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost