Kategori AI BENCHY
Peringkat Parsing dan ekstraksi data
Lihat model AI mana yang paling baik di Parsing dan ekstraksi data, mana yang tetap andal, dan di mana kesenjangan terbesar muncul. Urutkan berdasarkan: Metrik ↑.
| Peringkat | Model | Perusahaan | Skor Parsing dan ekstraksi data | Skor | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #70 | GPT-5.4 Nano medium | OpenAI | 10.0 | 7.0 | 2/2 | 2.54s |
| #71 | Step 3.7 Flash high | Stepfun | 10.0 | 7.0 | 2/2 | 14.7s |
| #72 | DeepSeek V3.2 medium | DeepSeek | 10.0 | 7.0 | 2/2 | 36.1s |
| #73 | Seed-2.0-Mini medium | Bytedance Seed | 10.0 | 6.9 | 2/2 | 24.3s |
| #74 | Qwen3.6 Max Preview none | Qwen | 10.0 | 6.9 | 2/2 | 2.87s |
| #76 | Kimi K2.5 medium | Moonshot AI | 10.0 | 6.8 | 2/2 | 49.8s |
| #77 | Claude Sonnet 4.6 none | Anthropic | 10.0 | 6.8 | 2/2 | 3.43s |
| #79 | Hunter Alpha medium | OpenRouter | 10.0 | 6.7 | 2/2 | 23.2s |
| #80 | Mimo V2 Omni medium | Xiaomi | 10.0 | 6.7 | 2/2 | 3.04s |
| #84 | Grok 4.20 Multi Agent Beta medium | X AI | 10.0 | 6.6 | 2/2 | 5.54s |
| #85 | Gemma 4 31B none | 10.0 | 6.5 | 2/2 | 2.25s | |
| #86 | Grok 4.1 Fast medium | X AI | 10.0 | 6.5 | 2/2 | 6.63s |
| #87 | Gemini 3.1 Flash Lite minimal | 10.0 | 6.4 | 2/2 | 1.04s | |
| #88 | Qwen3.7 Plus none | Qwen | 10.0 | 6.4 | 2/2 | 1.43s |
| #90 | Gemini 3.1 Flash Lite none | 10.0 | 6.4 | 2/2 | 843ms |