Kategori AI BENCHY
Peringkat Parsing dan ekstraksi data
Lihat model AI mana yang paling baik di Parsing dan ekstraksi data, mana yang tetap andal, dan di mana kesenjangan terbesar muncul. Urutkan berdasarkan: Metrik ↑.
| Peringkat | Model | Perusahaan | Skor Parsing dan ekstraksi data | Skor | Tes benar | Waktu respons (rata-rata) |
|---|---|---|---|---|---|---|
| #143 | MiMo-V2.5 none | Xiaomi | 6.5 | 4.9 | 1/2 | 1.01s |
| #156 | Hy3 preview none | Tencent | 6.5 | 4.4 | 1/2 | 2.85s |
| #113 | DeepSeek V4 Pro none | DeepSeek | 6.9 | 5.7 | 1/2 | 30.5s |
| #10 | Claude Opus 4.8 medium | Anthropic | 7.1 | 8.7 | 1/2 | 12.3s |
| #17 | GLM 5 medium | Z.ai | 7.1 | 8.3 | 1/2 | 8.90s |
| #107 | Laguna Xs.2 medium | Poolside | 7.1 | 5.8 | 1/2 | 9.34s |
| #43 | MiMo-V2.5-Pro medium | Xiaomi | 7.3 | 7.5 | 1/2 | 18.8s |
| #51 | Mimo V2 PRO medium | Xiaomi | 7.3 | 7.4 | 1/2 | 17.2s |
| #57 | Step 3.7 Flash low | Stepfun | 7.3 | 7.3 | 1/2 | 2.29s |
| #68 | Claude Opus 4.8 none | Anthropic | 7.3 | 7.0 | 1/2 | 1.77s |
| #118 | Qwen3.6 27B none | Qwen | 7.3 | 5.6 | 1/2 | 2.06s |
| #122 | GLM 4.7 Flash none | Z.ai | 7.3 | 5.5 | 1/2 | 4.82s |
| #135 | Kimi K2.5 none | Moonshot AI | 7.3 | 5.2 | 1/2 | 42.1s |
| #49 | Qwen3.5-Flash medium | Qwen | 7.3 | 7.4 | 1/2 | 57.0s |
| #66 | Qwen3.5-35B-A3B medium | Qwen | 7.3 | 7.1 | 1/2 | 59.3s |