AI BENCHY
Advertise here

Kushindwa kwa AI BENCHY

Kushindwa kwa Jibu lisilo sahihi

Ona ni modeli gani za AI hukutana na Jibu lisilo sahihi mara nyingi zaidi ili utambue hatari za utegemevu kabla ya kuchagua. Panga kwa: Idadi ya kushindwa ↑.

Modeli zilizoonyeshwa

15

Jumla ya kushindwa

1204

Modeli iliyoathirika zaidi

Gemini 3 Flash Preview 1
Nafasi Modeli Kampuni Idadi ya Jibu lisilo sahihi Alama Majaribio sahihi Muda wa majibu (wastani)
#36 Qwen3.5 Plus 2026-04-20 medium Qwen 8 7.6 13/21 46.4s
#39 Qwen3.6 Flash medium Qwen 8 7.5 12/21 19.2s
#48 Gemini 3 Flash Preview none Google 8 7.4 13/21 1.65s
#57 Step 3.7 Flash low Stepfun 8 7.3 12/21 15.7s
#70 GPT-5.4 Nano medium OpenAI 8 7.0 11/21 12.0s
#81 Mercury 2 medium Inception 8 6.6 10/21 2.24s
#85 Gemma 4 31B none Google 8 6.5 10/21 4.05s
#87 Gemini 3.1 Flash Lite minimal Google 8 6.4 10/21 1.33s
#126 gpt-oss-120b none OpenAI 8 5.4 6/19 21.6s
#146 Laguna Xs.2 none Poolside 8 4.8 5/19 806ms
#156 Hy3 preview none Tencent 8 4.4 4/21 12.9s
#61 Gemini 3.1 Flash Lite low Google 9 7.2 12/21 1.89s
#94 GPT-5 Nano medium OpenAI 9 6.3 9/21 42.5s
#99 gpt-oss-120b medium OpenAI 9 6.1 9/21 22.3s
#116 Hunter Alpha none OpenRouter 9 5.7 6/18 4.70s

Modeli bora kwa Idadi ya Jibu lisilo sahihi

Idadi ya Jibu lisilo sahihi dhidi ya Alama

Modeli bora kwa Muda wa majibu (wastani)