Kategori AI BENCHY
Peringkat Parsing dan ekstraksi data
Lihat model AI mana yang paling baik di Parsing dan ekstraksi data, mana yang tetap andal, dan di mana kesenjangan terbesar muncul. Urutkan berdasarkan: Tes benar ↑.
| Peringkat | Model | Perusahaan | Skor Parsing dan ekstraksi data | Skor | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #56 | MiMo-V2.5 medium | Xiaomi | 2.7 | 7.3 | 0/2 | 6.33s |
| #78 | Qwen3.6 27B medium | Qwen | 3.5 | 6.8 | 0/2 | 37.3s |
| #83 | Step 3.5 Flash none | Stepfun | 3.0 | 6.6 | 0/1 | 0ms |
| #94 | GPT-5 Nano medium | OpenAI | 3.7 | 6.3 | 0/2 | 21.4s |
| #96 | Ring-2.6-1T none | Inclusionai | 3.0 | 6.2 | 0/2 | 45.9s |
| #100 | Grok Build 0.1 none | X AI | 3.8 | 6.0 | 0/2 | 9.33s |
| #129 | MiniMax M2.5 medium | Minimax | 4.6 | 5.3 | 0/2 | 7.48s |
| #152 | MiMo-V2-Flash none | Xiaomi | 2.9 | 4.6 | 0/2 | 19.7s |
| #160 | LFM2-24B-A2B none | Liquid | 3.0 | 4.2 | 0/2 | 714ms |
| #161 | Qwen3.5-9B medium | Qwen | 3.6 | 4.2 | 0/2 | 87.3s |
| #162 | Nemotron 3 Nano Omni 30b A3b Reasoning none | NVIDIA | 3.8 | 4.1 | 0/2 | 1.42s |
| #163 | Granite 4.1 8B none | IBM Granite | 3.0 | 4.0 | 0/2 | 575ms |
| #10 | Claude Opus 4.8 medium | Anthropic | 7.1 | 8.7 | 1/2 | 12.3s |
| #17 | GLM 5 medium | Z.ai | 7.1 | 8.3 | 1/2 | 8.90s |
| #20 | Gemini 3.5 Flash none | 6.5 | 8.1 | 1/2 | 8.10s |