AI BENCHY
Bandingkan Grafik Metodologi
❤️ Made by XCS
Your ad here

Kegagalan AI BENCHY

Kegagalan Kedaluwarsa

Lihat model AI mana yang paling sering mengalami Kedaluwarsa, agar Anda bisa melihat risiko keandalan sebelum memilih.

Model yang ditampilkan

15

Total kegagalan

25

Model yang paling terdampak

Seed-2.0-Mini 4
Peringkat Model Perusahaan Jumlah Kedaluwarsa Skor Rata-rata Tes benar Waktu respons (rata-rata)
#23 Seed-2.0-Mini medium Bytedance Seed 4 6.9 10/16 65.1s
#35 Qwen3.5-35B-A3B medium Qwen 4 5.5 8/16 43.9s
#24 Qwen3.5-Flash medium Qwen 3 6.9 10/16 70.8s
#4 Qwen3.5 Plus 2026-02-15 medium Qwen 2 8.3 13/16 34.5s
#43 MiniMax M2.5 medium Minimax 2 4.7 5/16 43.0s
#7 Qwen3.5-27B medium Qwen 1 8.2 12/16 52.1s
#10 Qwen3.5-122B-A10B medium Qwen 1 7.7 12/16 29.7s
#11 Claude Sonnet 4.6 medium Anthropic 1 7.7 12/16 11.2s
#14 GLM 5 medium Z.ai 1 7.4 11/16 16.2s
#18 DeepSeek V3.2 medium DeepSeek 1 7.3 11/16 39.5s
#27 GPT-5.2 medium OpenAI 1 6.5 10/16 15.3s
#28 Kimi K2.5 medium Moonshot AI 1 6.4 9/16 69.8s
#30 Grok 4.1 Fast medium X AI 1 6.2 9/16 26.3s
#32 GPT-5 Mini medium OpenAI 1 6.0 8/16 25.1s
#34 GPT-5 Nano medium OpenAI 1 5.5 7/16 47.9s

Model teratas menurut Jumlah Kedaluwarsa

Jumlah Kedaluwarsa vs skor rata-rata

Model teratas menurut Waktu respons (rata-rata)