Kegagalan kategori AI BENCHY
Trik anti-AI: Jawaban salah
Trik anti-AI
Jawaban salah
Lihat model AI mana yang paling mungkin mengalami Jawaban salah di Trik anti-AI, agar Anda bisa menemukan titik lemahnya lebih cepat.
Alasan kegagalan
| Peringkat | Model | Perusahaan | Jumlah Jawaban salah | Skor kategori | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #92 | Qwen3 Coder Next medium | Qwen | 3 | 3.5 | 0/4 | 8.64s |
| #95 | Grok 4.1 Fast none | X AI | 3 | 3.2 | 0/4 | 1.07s |
| #98 | LFM2-24B-A2B none | Liquid | 3 | 3.3 | 0/3 | 471ms |
| #48 | Gemma 4 31B none | 2 | 6.5 | 2/4 | 1.85s | |
| #50 | Hunter Alpha medium | OpenRouter | 2 | 7.3 | 2/4 | 4.75s |
| #57 | GPT-5 Nano medium | OpenAI | 2 | 6.5 | 2/4 | 25.5s |
| #64 | DeepSeek V3.2 none | DeepSeek | 2 | 3.2 | 0/4 | 7.63s |
| #81 | Elephant medium | Openrouter | 2 | 6.6 | 2/4 | 1.19s |
| #87 | Qwen3 Coder Next none | Qwen | 2 | 3.6 | 0/4 | 3.31s |
| #93 | GLM 4.7 Flash medium | Z.ai | 2 | 4.7 | 1/4 | 15.0s |
| #3 | Claude Opus 4.7 medium | Anthropic | 1 | 8.3 | 3/4 | 1.85s |
| #4 | Claude Opus 4.7 none | Anthropic | 1 | 8.3 | 3/4 | 2.12s |
| #6 | Seed-2.0-Lite medium | Bytedance Seed | 1 | 8.3 | 3/4 | 18.0s |
| #7 | GPT-5.3-Codex medium | OpenAI | 1 | 8.7 | 3/4 | 4.16s |
| #8 | Qwen3.5 Plus 2026-02-15 medium | Qwen | 1 | 8.2 | 3/4 | 45.8s |