AI BENCHY
Advertise here

AI BENCHY Failures

API error Failures

See which AI models run into API error most often, so you can spot reliability risks before choosing one.

Models Shown

15

Total Failures

144

Most Affected Model

Qwen3.6 Plus Preview 8
Rank Model Company API error Count Score Tests Correct Response Time (avg)
#160 LFM2-24B-A2B none Liquid 4 4.2 2/16 782ms
#20 Gemini 3.5 Flash none Google 3 8.1 15/21 9.93s
#33 Hy3 preview medium Tencent 3 7.7 14/21 16.3s
#100 Grok Build 0.1 none X AI 3 6.0 7/19 28.7s
#105 Nemotron 3 Super medium NVIDIA 3 5.8 8/21 32.0s
#126 gpt-oss-120b none OpenAI 3 5.4 6/19 21.6s
#136 Elephant Alpha medium Openrouter 3 5.1 6/21 1.27s
#137 Elephant Alpha none Openrouter 3 5.1 5/21 1.22s
#159 Ling-2.6-1T none Inclusionai 3 4.3 3/21 7.72s
#27 Gemma 4 31B medium Google 2 7.8 14/21 56.5s
#46 Qwen3.6 35B A3B medium Qwen 2 7.4 13/21 18.1s
#72 DeepSeek V3.2 medium DeepSeek 2 7.0 11/21 68.7s
#75 Ring-2.6-1T medium Inclusionai 2 6.9 11/21 61.3s
#84 Grok 4.20 Multi Agent Beta medium X AI 2 6.6 8/18 9.69s
#85 Gemma 4 31B none Google 2 6.5 10/21 4.05s

Top Models by API error Count

API error Count vs Score

Top Models by Response Time (avg)