Gemini 2.5 Flash vs GLM 5V Turbo benchmark comparison: Gemini 2.5 Flash leads on average score with 6.2 vs 5.9. Gemini 2.5 Flash has the lower benchmark cost at $0.016 vs $0.052. Gemini 2.5 Flash is faster at 875ms vs 2.99s, with pass rates of 46.0% vs 38.1%.
Recommended model: Gemini 2.5 Flash - It has the best score here (6.2), while costing about 3.4x less than GLM 5V Turbo.
GLM 5V TurboGLM 5V TurbononeArchived model: this model is no longer updated or tested on new tests.Release: 2026-04-01
Score
6.2Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
5.9Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
Rank
#93
#105
Reliability
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
10.0First-attempt success score: 10.0 means no retryable target API or rate-limit failures before successful calls; tracked failures lower the score.…
Consistency
9.6Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
Tests Correct
A test is fully passed only if every run passed for that test.Wrong answer: 12Response Time (avg)875msResponse Time (max)4.39sResponse Time (total)18.37sA test is fully passed only if every run passed for that test.…
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)2.99sResponse Time (max)6.51sResponse Time (total)62.74sA test is fully passed only if every run passed for that test.…
Attempt pass rate
46.0%Attempt pass rate = passed attempts / total attempts across runs.…
38.1%Attempt pass rate = passed attempts / total attempts across runs.…
Flaky tests
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
Total Runs
63Total Runs…
63Total Runs…
Cost per result
0.169Shows the average cost per correct benchmark answer in cents (lower is better).…
0.645Shows the average cost per correct benchmark answer in cents (lower is better).…
Total Cost
$0.016Total Cost (Current Price)…
$0.052Total Cost (Current Price)…
Input Price
$0.300 / 1MInput Price…
$1.200 / 1MInput Price…
Output Price
$2.500 / 1MOutput Price…
$4.000 / 1MOutput Price…
Total Input Tokens
35,926Total Input Tokens…
37,100Total Input Tokens…
Output Tokens
1,770Output Tokens…
1,766Output Tokens…
Reasoning Tokens
0Reasoning Tokens…
0Reasoning Tokens…
Response Time (avg)
875msResponse Time (avg)…
2.99sResponse Time (avg)…
Response Time (max)
4.39sResponse Time (max)…
6.51sResponse Time (max)…
Response Time (total)
18.37sResponse Time (total)…
62.74sResponse Time (total)…
Generation showcase
Hamster playing table tennis
Prompt: Create a detailed SVG illustration of a hamster playing table tennis.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)582msResponse Time (max)844msResponse Time (total)2.33sA test is fully passed only if every run passed for that test.…
582msResponse Time (avg)…
492Total Input Tokens…
102Output Tokens…
0Reasoning Tokens…
GLM 5V TurboArchived model: this model is no longer updated or tested on new tests.
4.8Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
25.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.13sResponse Time (max)5.90sResponse Time (total)12.50sA test is fully passed only if every run passed for that test.…
5.5Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
33.3%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)736msResponse Time (max)1.16sResponse Time (total)2.21sA test is fully passed only if every run passed for that test.…
736msResponse Time (avg)…
8,122Total Input Tokens…
483Output Tokens…
0Reasoning Tokens…
GLM 5V TurboArchived model: this model is no longer updated or tested on new tests.
5.5Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
33.3%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.13sResponse Time (max)5.30sResponse Time (total)9.40sA test is fully passed only if every run passed for that test.…
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.39sResponse Time (max)4.39sResponse Time (total)4.39sA test is fully passed only if every run passed for that test.…
4.39sResponse Time (avg)…
12,519Total Input Tokens…
366Output Tokens…
0Reasoning Tokens…
GLM 5V TurboArchived model: this model is no longer updated or tested on new tests.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.51sResponse Time (max)6.51sResponse Time (total)6.51sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)652msResponse Time (max)660msResponse Time (total)1.30sA test is fully passed only if every run passed for that test.…
652msResponse Time (avg)…
7,257Total Input Tokens…
279Output Tokens…
0Reasoning Tokens…
GLM 5V TurboArchived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.81sResponse Time (max)5.69sResponse Time (total)7.62sA test is fully passed only if every run passed for that test.…
5.9Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
7.2Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
55.6%Attempt pass rate = passed attempts / total attempts across runs.…
1Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)495msResponse Time (max)642msResponse Time (total)1.49sA test is fully passed only if every run passed for that test.…
495msResponse Time (avg)…
633Total Input Tokens…
12Output Tokens…
0Reasoning Tokens…
GLM 5V TurboArchived model: this model is no longer updated or tested on new tests.
5.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
33.3%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.09sResponse Time (max)2.39sResponse Time (total)6.26sA test is fully passed only if every run passed for that test.…
5.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)615msResponse Time (max)615msResponse Time (total)615msA test is fully passed only if every run passed for that test.…
615msResponse Time (avg)…
486Total Input Tokens…
78Output Tokens…
0Reasoning Tokens…
GLM 5V TurboArchived model: this model is no longer updated or tested on new tests.
4.6Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.22sResponse Time (max)2.22sResponse Time (total)2.22sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)590msResponse Time (max)622msResponse Time (total)1.18sA test is fully passed only if every run passed for that test.…
590msResponse Time (avg)…
615Total Input Tokens…
72Output Tokens…
0Reasoning Tokens…
GLM 5V TurboArchived model: this model is no longer updated or tested on new tests.
6.5Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
50.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.97sResponse Time (max)2.43sResponse Time (total)3.93sA test is fully passed only if every run passed for that test.…
7.7Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
66.7%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)604msResponse Time (max)700msResponse Time (total)1.81sA test is fully passed only if every run passed for that test.…
604msResponse Time (avg)…
558Total Input Tokens…
132Output Tokens…
0Reasoning Tokens…
GLM 5V TurboArchived model: this model is no longer updated or tested on new tests.
5.3Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
33.3%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Did not follow instructions: 1Wrong answer: 1Response Time (avg)2.40sResponse Time (max)3.81sResponse Time (total)7.21sA test is fully passed only if every run passed for that test.…
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.91sResponse Time (max)1.91sResponse Time (total)1.91sA test is fully passed only if every run passed for that test.…
1.91sResponse Time (avg)…
5,088Total Input Tokens…
234Output Tokens…
0Reasoning Tokens…
GLM 5V TurboArchived model: this model is no longer updated or tested on new tests.
10.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
100.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.86sResponse Time (max)4.86sResponse Time (total)4.86sA test is fully passed only if every run passed for that test.…
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.15sResponse Time (max)1.15sResponse Time (total)1.15sA test is fully passed only if every run passed for that test.…
1.15sResponse Time (avg)…
156Total Input Tokens…
12Output Tokens…
0Reasoning Tokens…
GLM 5V TurboArchived model: this model is no longer updated or tested on new tests.
3.0Summarizes broad quality across our full private benchmark suite, so ranking reflects consistent performance.…
10.0Consistency score reflects run-to-run stability (10 = very consistent, even if consistently wrong).…
0.0%Attempt pass rate = passed attempts / total attempts across runs.…
0Flaky tests had mixed outcomes across runs (at least one pass and one fail).…
A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.23sResponse Time (max)2.23sResponse Time (total)2.23sA test is fully passed only if every run passed for that test.…