AI BENCHY
Advertise here

Kushindwa kwa AI BENCHY

Kushindwa kwa Hakuna jibu

Ona ni modeli gani za AI hukutana na Hakuna jibu mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua.

Modeli zilizoonyeshwa

15

Jumla ya kushindwa

43

Modeli iliyoathirika zaidi

Step 3.7 Flash 4
Nafasi Modeli Kampuni Idadi ya Hakuna jibu Alama Majaribio sahihi Muda wa majibu (wastani)
#71 Step 3.7 Flash high Stepfun 4 7.0 11/21 64.5s
#78 Qwen3.6 27B medium Qwen 3 6.8 10/21 59.7s
#158 GLM 4.7 Flash medium Z.ai 3 4.4 4/21 35.1s
#37 Gemma 4 26B A4B medium Google 2 7.6 14/21 63.4s
#66 Qwen3.5-35B-A3B medium Qwen 2 7.1 11/21 72.6s
#76 Kimi K2.5 medium Moonshot AI 2 6.8 10/21 98.4s
#80 Mimo V2 Omni medium Xiaomi 2 6.7 10/21 41.2s
#107 Laguna Xs.2 medium Poolside 2 5.8 6/19 6.73s
#161 Qwen3.5-9B medium Qwen 2 4.2 3/21 82.2s
#10 Claude Opus 4.8 medium Anthropic 1 8.7 17/21 9.66s
#17 GLM 5 medium Z.ai 1 8.3 15/21 33.5s
#22 Step 3.7 Flash medium Stepfun 1 8.0 14/21 20.4s
#23 GLM 5 Turbo medium Z.ai 1 8.0 14/21 23.0s
#27 Gemma 4 31B medium Google 1 7.8 14/21 56.5s
#42 GPT-5.2 medium OpenAI 1 7.5 13/21 16.9s

Modeli bora kwa Idadi ya Hakuna jibu

Idadi ya Hakuna jibu dhidi ya Alama

Modeli bora kwa Muda wa majibu (wastani)