AI BENCHY
Bandingkan Grafik Metodologi
❤️ Made by XCS
AD
Track all your projects in one dashboard. Get 📊stats, 🔥heatmaps and 👀recordings in one self-hosted dashboard.
uxwizz.com

Kegagalan kategori AI BENCHY

Gabungan
Jawaban salah

Lihat model AI mana yang paling mungkin mengalami Jawaban salah di Gabungan, agar Anda bisa menemukan titik lemahnya lebih cepat. Urutkan berdasarkan: Waktu respons (rata-rata) ↓.

Model yang ditampilkan

21

Total kegagalan

21

Model yang paling terdampak

Qwen3.5-35B-A3B 1
Peringkat Model Perusahaan Jumlah Jawaban salah Skor kategori Tes benar Waktu respons (rata-rata)
#42 Qwen3.5-35B-A3B none Qwen 1 10.0 0/1 47.4s
#40 Qwen3.5-122B-A10B none Qwen 1 10.0 0/1 46.0s
#48 Qwen3 Coder Next none Qwen 1 10.0 0/1 45.1s
#46 Kimi K2.5 none Moonshot AI 1 10.0 0/1 19.2s
#17 Gemini 3.1 Flash Lite Preview low Google 1 10.0 0/1 11.9s
#6 Gemini 3 Pro Preview medium Google 1 10.0 0/1 10.4s
#41 Qwen3.5-27B none Qwen 1 10.0 0/1 9.39s
#45 Trinity Large Preview none Arcee AI 1 10.0 0/1 8.91s
#47 GPT-4o-mini none OpenAI 1 10.0 0/1 7.58s
#29 Qwen3.5 Plus 2026-02-15 none Qwen 1 10.0 0/1 6.65s
#37 Qwen3.5-Flash none Qwen 1 10.0 0/1 6.22s
#31 GLM 5 none Z.ai 1 10.0 0/1 4.98s
#38 Gemini 2.5 Flash none Google 1 10.0 0/1 4.39s
#50 Qwen3 Coder Next medium Qwen 1 10.0 0/1 4.28s
#20 Gemini 3 Flash Preview none Google 1 10.0 0/1 3.56s
#53 Grok 4.1 Fast none X AI 1 10.0 0/1 3.33s
#5 Gemini 3 Flash Preview low Google 1 10.0 0/1 3.27s
#22 Gemini 3.1 Flash Lite Preview none Google 1 10.0 0/1 3.20s
#44 GPT-5.4 none OpenAI 1 10.0 0/1 2.89s
#54 MiMo-V2-Flash none Xiaomi 1 10.0 0/1 2.87s
#51 Mercury 2 none Inception 1 10.0 0/1 606ms

Model teratas menurut Jumlah Jawaban salah

Jumlah Jawaban salah vs skor rata-rata

Model teratas menurut Waktu respons (rata-rata)

Model teratas menurut Perkiraan biaya terbuang