AI BENCHY
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

AI BENCHY Category Failures

Combined: API error

Combined
API error

See which AI models are most likely to hit API error on Combined, so you can spot weak points faster. Sort by: Response Time (avg) ↓.

Models Shown

5

Total Failures

5

Most Affected Model

Gemma 4 31B 1
Rank Model Company API error Count Category Score Tests Correct Response Time (avg)
#14 Gemma 4 31B medium Google 1 3.0 0/1 0ms
#48 Gemma 4 31B none Google 1 3.0 0/1 0ms
#56 Grok 4.20 Multi Agent Beta medium X AI 1 3.0 0/1 0ms
#84 gpt-oss-120b none OpenAI 1 3.0 0/1 0ms
#98 LFM2-24B-A2B none Liquid 1 3.0 0/1 0ms

Top Models by API error Count

API error Count vs Score

Top Models by Response Time (avg)

Top Models by Estimated Wasted Cost