AI BENCHY
Advertise here

AI BENCHY Failures

API error Failures

See which AI models run into API error most often, so you can spot reliability risks before choosing one. Sort by: Failure Count ↑.

Models Shown

15

Total Failures

144

Most Affected Model

Qwen3.5 Plus 2026-02-15 1
Rank Model Company API error Count Score Tests Correct Response Time (avg)
#119 Cobuddy medium Baidu 1 5.6 7/21 39.9s
#120 Mimo V2 PRO none Xiaomi 1 5.6 7/21 2.27s
#130 MiniMax M2.7 medium Minimax 1 5.3 5/21 38.2s
#152 MiMo-V2-Flash none Xiaomi 1 4.6 4/21 2.76s
#161 Qwen3.5-9B medium Qwen 1 4.2 3/21 82.2s
#27 Gemma 4 31B medium Google 2 7.8 14/21 56.5s
#46 Qwen3.6 35B A3B medium Qwen 2 7.4 13/21 18.1s
#72 DeepSeek V3.2 medium DeepSeek 2 7.0 11/21 68.7s
#75 Ring-2.6-1T medium Inclusionai 2 6.9 11/21 61.3s
#84 Grok 4.20 Multi Agent Beta medium X AI 2 6.6 8/18 9.69s
#85 Gemma 4 31B none Google 2 6.5 10/21 4.05s
#132 Mistral Small 4 medium Mistral 2 5.3 5/21 9.40s
#138 Ling-2.6-flash none Inclusionai 2 5.0 6/21 9.34s
#151 Trinity Large Preview none Arcee AI 2 4.6 4/21 2.98s
#153 Qwen3.6 35B A3B none Qwen 2 4.6 4/21 3.73s

Top Models by API error Count

API error Count vs Score

Top Models by Response Time (avg)