AI BENCHY
Advertise here

Kushindwa kwa AI BENCHY

Kushindwa kwa Hakuna jibu

Ona ni modeli gani za AI hukutana na Hakuna jibu mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua. Panga kwa: Muda wa majibu (wastani) ↑.

Modeli zilizoonyeshwa

15

Jumla ya kushindwa

43

Modeli iliyoathirika zaidi

Claude Opus 4.8 1
Nafasi Modeli Kampuni Idadi ya Hakuna jibu Alama Majaribio sahihi Muda wa majibu (wastani)
#158 GLM 4.7 Flash medium Z.ai 3 4.4 4/21 35.1s
#130 MiniMax M2.7 medium Minimax 1 5.3 5/21 38.2s
#80 Mimo V2 Omni medium Xiaomi 2 6.7 10/21 41.2s
#27 Gemma 4 31B medium Google 1 7.8 14/21 56.5s
#78 Qwen3.6 27B medium Qwen 3 6.8 10/21 59.7s
#53 Gemini 3.1 Flash Lite high Google 1 7.3 10/18 62.0s
#37 Gemma 4 26B A4B medium Google 2 7.6 14/21 63.4s
#71 Step 3.7 Flash high Stepfun 4 7.0 11/21 64.5s
#129 MiniMax M2.5 medium Minimax 1 5.3 5/21 65.4s
#67 MiniMax M3 medium Minimax 1 7.1 11/21 68.2s
#60 Kimi K2.6 medium Moonshot AI 1 7.2 12/21 71.7s
#62 Step 3.5 Flash medium Stepfun 1 7.2 11/20 72.5s
#66 Qwen3.5-35B-A3B medium Qwen 2 7.1 11/21 72.6s
#161 Qwen3.5-9B medium Qwen 2 4.2 3/21 82.2s
#76 Kimi K2.5 medium Moonshot AI 2 6.8 10/21 98.4s

Modeli bora kwa Idadi ya Hakuna jibu

Idadi ya Hakuna jibu dhidi ya Alama

Modeli bora kwa Muda wa majibu (wastani)