AI BENCHY Compare
Google: Gemini 3.1 Flash Lite Preview vs OpenAI: GPT-5.4
Vergelijken:
Benchmarks gegenereerd uit AI BENCHY-testsuites op: 2026-03-05
| Metriek | Google: Gemini 3.1 Flash Lite Preview none Releasedatum: 2026-03-03 | OpenAI: GPT-5.4 medium Releasedatum: 2026-03-05 |
|---|---|---|
| Gem. score | 7.4 | 8.2 |
| Correcte tests | ||
| Rang | #20 | #7 |
| Consistentie | 9.6 | 8.9 |
| Kosten per resultaat | 0.142 | 6.533 |
| Totale kosten | $0.015 | $0.784 |
| Slaagpercentage per poging | 71.1% | 86.7% |
| Instabiele tests | 1 | 2 |
| common.totalAttempts | 45 (15 x 3) | 45 (15 x 3) |
| Uitvoer-tokens | 4,646 | 1,611 |
| Redeneer-tokens | 0 | 46,321 |
| Responstijd (gem.) | 1.37s | 21.06s |
| Responstijd (max) | 3.39s | 100.41s |
| Responstijd (totaal) | 20.53s | 315.95s |
Score vs totale kosten
Responstijd (gem.)
Gem. score vs Responstijd (gem.)
Categorie-uitsplitsing
| Anti-AI-trucs | Score | Consistentie | Slaagpercentage per poging | Instabiele tests | Correcte tests | Responstijd (gem.) | Uitvoer-tokens | Redeneer-tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 6.0 | 7.8 | 55.6% | 1 | 1.16s | 1,086 | 0 | |
| OpenAI: GPT-5.4 | 10.0 | 10.0 | 100.0% | 0 | 5.02s | 216 | 1,466 |
| Gecombineerd | Score | Consistentie | Slaagpercentage per poging | Instabiele tests | Correcte tests | Responstijd (gem.) | Uitvoer-tokens | Redeneer-tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 10.0 | 10.0 | 0.0% | 0 | 3.20s | 339 | 0 | |
| OpenAI: GPT-5.4 | 10.0 | 10.0 | 100.0% | 0 | 20.57s | 301 | 3,543 |
| Gegevensparsering en extractie | Score | Consistentie | Slaagpercentage per poging | Instabiele tests | Correcte tests | Responstijd (gem.) | Uitvoer-tokens | Redeneer-tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 9.9 | 10.0 | 100.0% | 0 | 1.22s | 399 | 0 | |
| OpenAI: GPT-5.4 | 9.9 | 10.0 | 100.0% | 0 | 5.32s | 234 | 804 |
| Domeinspecifiek | Score | Consistentie | Slaagpercentage per poging | Instabiele tests | Correcte tests | Responstijd (gem.) | Uitvoer-tokens | Redeneer-tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 4.0 | 10.0 | 33.3% | 0 | 942ms | 568 | 0 | |
| OpenAI: GPT-5.4 | 4.0 | 7.2 | 44.4% | 1 | 74.27s | 61 | 34,748 |
| Instructies opvolgen | Score | Consistentie | Slaagpercentage per poging | Instabiele tests | Correcte tests | Responstijd (gem.) | Uitvoer-tokens | Redeneer-tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 10.0 | 10.0 | 100.0% | 0 | 1.13s | 574 | 0 | |
| OpenAI: GPT-5.4 | 10.0 | 10.0 | 100.0% | 0 | 3.11s | 93 | 897 |
| Puzzle Solving | Score | Consistentie | Slaagpercentage per poging | Instabiele tests | Correcte tests | Responstijd (gem.) | Uitvoer-tokens | Redeneer-tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 10.0 | 10.0 | 100.0% | 0 | 972ms | 898 | 0 | |
| OpenAI: GPT-5.4 | 7.0 | 7.2 | 88.9% | 1 | 9.13s | 442 | 3,832 |
| Toolaanroepen | Score | Consistentie | Slaagpercentage per poging | Instabiele tests | Correcte tests | Responstijd (gem.) | Uitvoer-tokens | Redeneer-tokens |
|---|---|---|---|---|---|---|---|---|
| Google: Gemini 3.1 Flash Lite Preview | 10.0 | 10.0 | 100.0% | 0 | 3.39s | 782 | 0 | |
| OpenAI: GPT-5.4 | 10.0 | 10.0 | 100.0% | 0 | 13.28s | 264 | 1,031 |
Snelle vergelijking
Vergelijkingspaar wisselen
Gemini 3 Flash PreviewlowvsGPT-5.4mediumGemini 3.1 Flash Lite PreviewnonevsGLM 5mediumGemini 3.1 Flash Lite PreviewhighvsGPT-5.4mediumGemini 3.1 Flash Lite PreviewnonevsMiMo-V2-FlashmediumGemini 3.1 Flash Lite PreviewnonevsStep 3.5 FlashmediumGratis beschikbaarClaude Sonnet 4.6mediumvsGemini 3.1 Flash Lite PreviewnoneDeepSeek V3.2mediumvsGemini 3.1 Flash Lite PreviewnoneSeed-2.0-MinimediumvsGemini 3.1 Flash Lite PreviewnoneGemini 3.1 Flash Lite PreviewnonevsQwen3.5-FlashmediumGemini 3.1 Flash Lite PreviewnonevsGPT-5.2mediumGemini 3.1 Flash Lite PreviewlowvsGPT-5.4mediumGemini 3.1 Flash Lite PreviewnonevsQwen3.5-122B-A10Bmedium