AI BENCHY
Advertise here

Kushindwa kwa kategoria za AI BENCHY

Maarifa ya jumla: Jibu lisilo sahihi

Maarifa ya jumla
Jibu lisilo sahihi

Ona ni modeli gani za AI zina uwezekano mkubwa wa kupata Jibu lisilo sahihi katika Maarifa ya jumla, ili uone udhaifu haraka.

Modeli zilizoonyeshwa

15

Jumla ya kushindwa

117

Modeli iliyoathirika zaidi

Claude Opus 4.7 1

Sababu za kushindwa

Nafasi Modeli Kampuni Idadi ya Jibu lisilo sahihi Alama ya kategoria Majaribio sahihi Muda wa majibu (wastani)
#106 MiniMax M2.5 medium Minimax 1 3.0 0/1 80.8s
#107 Mistral Small 4 medium Mistral 1 3.0 0/1 5.92s
#110 Qwen3.5-122B-A10B none Qwen 1 3.0 0/1 295ms
#112 Kimi K2.5 none Moonshot AI 1 3.0 0/1 3.90s
#114 GLM 5 Turbo none Z.ai 1 3.0 0/1 2.37s
#118 Ling-2.6-flash none Inclusionai 1 3.0 0/1 1.06s
#119 gpt-oss-120b none OpenAI 1 3.0 0/1 47.3s
#120 DeepSeek V4 Flash none DeepSeek 1 3.0 0/1 3.07s
#121 Qwen3 Coder Next none Qwen 1 3.0 0/1 601ms
#122 Nemotron 3 Super none NVIDIA 1 3.0 0/1 8.94s
#123 MiniMax M2.7 medium Minimax 1 3.0 0/1 22.8s
#124 Mistral Small 4 none Mistral 1 3.0 0/1 397ms
#125 GPT-5.4 Mini none OpenAI 1 3.0 0/1 1.33s
#126 Qwen3.6 35B A3B none Qwen 1 3.0 0/1 414ms
#127 GPT-4o-mini none OpenAI 1 3.0 0/1 794ms

Modeli bora kwa Idadi ya Jibu lisilo sahihi

Idadi ya Jibu lisilo sahihi dhidi ya Alama

Modeli bora kwa Muda wa majibu (wastani)

Modeli bora kwa Gharama iliyopotezwa inayokadiriwa