AI BENCHY
Advertise here

AI BENCHY Failures

API error Failures

See which AI models run into API error most often, so you can spot reliability risks before choosing one. Sort by: Score ↑.

Models Shown

15

Total Failures

144

Rank Model Company API error Count Score Tests Correct Response Time (avg)
#132 Mistral Small 4 medium Mistral 2 5.3 5/21 9.40s
#130 MiniMax M2.7 medium Minimax 1 5.3 5/21 38.2s
#126 gpt-oss-120b none OpenAI 3 5.4 6/19 21.6s
#120 Mimo V2 PRO none Xiaomi 1 5.6 7/21 2.27s
#119 Cobuddy medium Baidu 1 5.6 7/21 39.9s
#116 Hunter Alpha none OpenRouter 1 5.7 6/18 4.70s
#113 DeepSeek V4 Pro none DeepSeek 1 5.7 7/21 12.4s
#111 Owl Alpha medium Openrouter 1 5.7 8/21 11.9s
#107 Laguna Xs.2 medium Poolside 4 5.8 6/19 6.73s
#105 Nemotron 3 Super medium NVIDIA 3 5.8 8/21 32.0s
#103 DeepSeek V4 Pro high DeepSeek 5 6.0 8/21 65.2s
#101 Mimo V2 Omni none Xiaomi 1 6.0 8/21 2.44s
#100 Grok Build 0.1 none X AI 3 6.0 7/19 28.7s
#96 Ring-2.6-1T none Inclusionai 5 6.2 9/21 55.1s
#93 Qwen3.6 Plus Preview medium Qwen 8 6.3 9/19 15.2s

Top Models by API error Count

API error Count vs Score

Top Models by Response Time (avg)