A test is fully passed only if every run passed for that test.Wrong answer: 13Did not follow instructions: 3Response Time (avg)1.62sResponse Time (max)5.51sResponse Time (total)19.48s…
Coding
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.79sResponse Time (max)1.79sResponse Time (total)1.79s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.33sResponse Time (max)3.33sResponse Time (total)3.33s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)943msResponse Time (max)943msResponse Time (total)943ms
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.06sResponse Time (max)1.06sResponse Time (total)1.06s
Puzzle Solving
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.10sResponse Time (max)1.36sResponse Time (total)2.21s
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.51sResponse Time (max)5.51sResponse Time (total)5.51s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)731msResponse Time (max)731msResponse Time (total)731ms
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.07sResponse Time (max)4.40sResponse Time (total)8.30s
Coding
: 4.0 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)14.34sResponse Time (max)14.34sResponse Time (total)14.34s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.91sResponse Time (max)8.91sResponse Time (total)8.91s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.26sResponse Time (max)4.66sResponse Time (total)6.52s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)877msResponse Time (max)894msResponse Time (total)2.63s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.67sResponse Time (max)6.67sResponse Time (total)6.67s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)777msResponse Time (max)777msResponse Time (total)777ms
Coding
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.01sResponse Time (max)3.14sResponse Time (total)4.03s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)45.14sResponse Time (max)45.14sResponse Time (total)45.14s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.32sResponse Time (max)1.32sResponse Time (total)1.32s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)962msResponse Time (max)962msResponse Time (total)962ms
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.34sResponse Time (max)1.34sResponse Time (total)1.34s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.78sResponse Time (max)14.65sResponse Time (total)15.56s
Puzzle Solving
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)24.34sResponse Time (max)42.58sResponse Time (total)48.69s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.47sResponse Time (max)2.47sResponse Time (total)2.47s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)601msResponse Time (max)601msResponse Time (total)601ms
A test is fully passed only if every run passed for that test.Wrong answer: 15Did not follow instructions: 1Response Time (avg)614msResponse Time (max)1.27sResponse Time (total)12.28s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)483msResponse Time (max)716msResponse Time (total)1.93s
Coding
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)831msResponse Time (max)969msResponse Time (total)1.66s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)606msResponse Time (max)606msResponse Time (total)606ms
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)667msResponse Time (max)819msResponse Time (total)1.33s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)534msResponse Time (max)733msResponse Time (total)1.60s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)551msResponse Time (max)622msResponse Time (total)1.10s
Puzzle Solving
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)535msResponse Time (max)642msResponse Time (total)1.60s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)548msResponse Time (max)548msResponse Time (total)548ms
Coding
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.57sResponse Time (max)9.57sResponse Time (total)9.57s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)7.12sResponse Time (max)7.12sResponse Time (total)7.12s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)34.98sResponse Time (max)68.97sResponse Time (total)104.94s
General Intelligence
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.79sResponse Time (max)10.79sResponse Time (total)10.79s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.06sResponse Time (max)5.85sResponse Time (total)10.12s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.29sResponse Time (max)47.29sResponse Time (total)47.29s
A test is fully passed only if every run passed for that test.Wrong answer: 14Did not follow instructions: 2Response Time (avg)1.33sResponse Time (max)3.84sResponse Time (total)26.54s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.18sResponse Time (max)1.81sResponse Time (total)4.70s
Coding
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.09sResponse Time (max)1.43sResponse Time (total)2.18s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.84sResponse Time (max)3.84sResponse Time (total)3.84s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.25sResponse Time (total)2.23s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)926msResponse Time (max)959msResponse Time (total)2.78s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)784msResponse Time (max)859msResponse Time (total)1.57s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.40sResponse Time (max)3.40sResponse Time (total)3.40s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)773msResponse Time (max)773msResponse Time (total)773ms
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)3.50sResponse Time (max)47.43sResponse Time (total)70.00s…
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.43sResponse Time (max)4.39sResponse Time (total)5.71s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.72sResponse Time (max)2.67sResponse Time (total)3.43s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.43sResponse Time (max)47.43sResponse Time (total)47.43s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.16sResponse Time (max)1.42sResponse Time (total)2.33s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)485msResponse Time (max)549msResponse Time (total)1.45s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)809msResponse Time (max)983msResponse Time (total)1.62s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.30sResponse Time (max)2.30sResponse Time (total)2.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)493msResponse Time (max)493msResponse Time (total)493ms
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 3Response Time (avg)22.41sResponse Time (max)68.16sResponse Time (total)291.35s…
Coding
: 3.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)47.24sResponse Time (max)68.16sResponse Time (total)94.49s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.18sResponse Time (max)31.18sResponse Time (total)31.18s
Data parsing and extraction
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.98sResponse Time (max)1.98sResponse Time (total)1.98s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)50.92sResponse Time (max)50.92sResponse Time (total)50.92s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.63sResponse Time (max)7.63sResponse Time (total)7.63s
Tool Calling
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.91sResponse Time (max)6.91sResponse Time (total)6.91s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)26.51sResponse Time (max)26.51sResponse Time (total)26.51s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 1Response Time (avg)1.09sResponse Time (max)2.97sResponse Time (total)21.79s…
Anti-AI Tricks
: 7.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.07sResponse Time (max)1.91sResponse Time (total)4.27s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.13sResponse Time (max)1.59sResponse Time (total)2.26s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.73sResponse Time (max)2.73sResponse Time (total)2.73s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)843msResponse Time (max)907msResponse Time (total)1.69s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)762msResponse Time (max)814msResponse Time (total)2.29s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)992msResponse Time (max)992msResponse Time (total)992ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)859msResponse Time (max)975msResponse Time (total)1.72s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.97sResponse Time (max)2.97sResponse Time (total)2.97s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)733msResponse Time (max)733msResponse Time (total)733ms
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 3Response Time (avg)1.37sResponse Time (max)4.49sResponse Time (total)27.32s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.10sResponse Time (max)1.65sResponse Time (total)4.42s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)951msResponse Time (max)1.31sResponse Time (total)1.90s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.53sResponse Time (max)2.53sResponse Time (total)2.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.04sResponse Time (max)1.32sResponse Time (total)2.07s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.02sResponse Time (max)1.16sResponse Time (total)3.06s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)932msResponse Time (max)1.00sResponse Time (total)1.86s
Puzzle Solving
: 6.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 2Response Time (avg)2.15sResponse Time (max)4.49sResponse Time (total)6.45s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.51sResponse Time (max)3.51sResponse Time (total)3.51s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)724msResponse Time (max)724msResponse Time (total)724ms
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.63sResponse Time (max)4.60sResponse Time (total)6.51s
Coding
: 6.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.34sResponse Time (max)2.46sResponse Time (total)4.68s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)4.22sResponse Time (max)4.22sResponse Time (total)4.22s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.13sResponse Time (max)3.35sResponse Time (total)4.26s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.11sResponse Time (max)1.89sResponse Time (total)3.32s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)947msResponse Time (max)947msResponse Time (total)947ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.10sResponse Time (max)1.36sResponse Time (total)2.19s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.49sResponse Time (max)2.49sResponse Time (total)2.49s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)649msResponse Time (max)649msResponse Time (total)649ms
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)1.69sResponse Time (max)9.39sResponse Time (total)33.82s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)788msResponse Time (max)1.34sResponse Time (total)3.15s
Coding
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.98sResponse Time (max)2.51sResponse Time (total)3.97s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.39sResponse Time (max)9.39sResponse Time (total)9.39s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.43sResponse Time (max)1.45sResponse Time (total)2.86s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)540msResponse Time (max)649msResponse Time (total)1.62s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.03sResponse Time (max)1.40sResponse Time (total)2.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)599msResponse Time (max)599msResponse Time (total)599ms
A test is fully passed only if every run passed for that test.Wrong answer: 11Response Time (avg)889msResponse Time (max)4.39sResponse Time (total)17.79s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)582msResponse Time (max)844msResponse Time (total)2.33s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)810msResponse Time (max)1.16sResponse Time (total)1.62s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.39sResponse Time (max)4.39sResponse Time (total)4.39s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)652msResponse Time (max)660msResponse Time (total)1.30s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)495msResponse Time (max)642msResponse Time (total)1.49s
General Intelligence
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)615msResponse Time (max)615msResponse Time (total)615ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)590msResponse Time (max)622msResponse Time (total)1.18s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)604msResponse Time (max)700msResponse Time (total)1.81s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.91sResponse Time (max)1.91sResponse Time (total)1.91s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.15sResponse Time (max)1.15sResponse Time (total)1.15s
A test is fully passed only if every run passed for that test.API error: 6Wrong answer: 4Response Time (avg)24.56sResponse Time (max)78.74sResponse Time (total)368.35s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)9.32sResponse Time (max)12.36sResponse Time (total)27.96s
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)27.94sResponse Time (max)27.94sResponse Time (total)27.94s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)78.74sResponse Time (max)78.74sResponse Time (total)78.74s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)5.85sResponse Time (max)5.85sResponse Time (total)5.85s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)40.44sResponse Time (max)46.32sResponse Time (total)121.31s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.98sResponse Time (max)22.24sResponse Time (total)31.97s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)7.51sResponse Time (max)7.86sResponse Time (total)15.02s
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)17.84sResponse Time (max)17.84sResponse Time (total)17.84s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)41.74sResponse Time (max)41.74sResponse Time (total)41.74s
A test is fully passed only if every run passed for that test.Wrong answer: 11Response Time (avg)2.40sResponse Time (max)6.65sResponse Time (total)33.56s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.91sResponse Time (max)2.74sResponse Time (total)3.82s
Coding
: 4.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.54sResponse Time (max)3.63sResponse Time (total)5.09s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.65sResponse Time (max)6.65sResponse Time (total)6.65s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.17sResponse Time (max)1.44sResponse Time (total)2.33s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.26sResponse Time (max)2.26sResponse Time (total)2.26s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.67sResponse Time (max)1.67sResponse Time (total)1.67s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.71sResponse Time (max)3.29sResponse Time (total)5.41s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.33sResponse Time (max)3.33sResponse Time (total)3.33s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.11sResponse Time (total)1.11s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 4Response Time (avg)1.84sResponse Time (max)8.32sResponse Time (total)36.79s…
Combined
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.32sResponse Time (max)1.42sResponse Time (total)2.64s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)877msResponse Time (max)904msResponse Time (total)2.63s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.58sResponse Time (max)2.58sResponse Time (total)2.58s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.03sResponse Time (max)1.10sResponse Time (total)2.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.30sResponse Time (max)3.30sResponse Time (total)3.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Combined
: 6.5 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)115.89sResponse Time (max)115.89sResponse Time (total)115.89s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.42sResponse Time (max)16.20sResponse Time (total)18.84s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)4.17sResponse Time (max)9.09sResponse Time (total)12.51s
General Intelligence
: 4.7 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)9.32sResponse Time (max)9.32sResponse Time (total)9.32s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.52sResponse Time (max)1.99sResponse Time (total)3.04s
Puzzle Solving
: 7.6 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)6.91sResponse Time (max)10.09sResponse Time (total)20.74s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.85sResponse Time (max)11.85sResponse Time (total)11.85s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)17.23sResponse Time (max)17.23sResponse Time (total)17.23s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 2Response Time (avg)1.23sResponse Time (max)3.39sResponse Time (total)24.68s…
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.06sResponse Time (max)1.47sResponse Time (total)2.13s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.20sResponse Time (max)3.20sResponse Time (total)3.20s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.22sResponse Time (max)1.33sResponse Time (total)2.44s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)942msResponse Time (max)1.12sResponse Time (total)2.83s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.13sResponse Time (max)1.14sResponse Time (total)2.27s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)900msResponse Time (max)962msResponse Time (total)2.70s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.39sResponse Time (max)3.39sResponse Time (total)3.39s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)814msResponse Time (max)814msResponse Time (total)814ms
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.59sResponse Time (max)10.20sResponse Time (total)26.37s
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)31.37sResponse Time (max)31.37sResponse Time (total)31.37s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)46.04sResponse Time (max)46.04sResponse Time (total)46.04s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)5.25sResponse Time (max)5.25sResponse Time (total)5.25s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)22.30sResponse Time (max)30.51sResponse Time (total)66.90s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.84sResponse Time (max)16.84sResponse Time (total)16.84s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.16sResponse Time (max)7.72sResponse Time (total)12.31s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)11.06sResponse Time (max)14.35sResponse Time (total)33.17s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.02sResponse Time (max)15.02sResponse Time (total)15.02s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)39.86sResponse Time (max)39.86sResponse Time (total)39.86s
A test is fully passed only if every run passed for that test.Wrong answer: 12Response Time (avg)2.48sResponse Time (max)6.70sResponse Time (total)49.67s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.43sResponse Time (max)6.70sResponse Time (total)9.73s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.95sResponse Time (max)4.61sResponse Time (total)5.89s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.59sResponse Time (max)6.59sResponse Time (total)6.59s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.82sResponse Time (max)1.97sResponse Time (total)3.63s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.33sResponse Time (max)1.53sResponse Time (total)4.00s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.45sResponse Time (max)3.45sResponse Time (total)3.45s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.06sResponse Time (max)1.09sResponse Time (total)2.12s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.78sResponse Time (max)5.20sResponse Time (total)8.34s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.94sResponse Time (max)3.94sResponse Time (total)3.94s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)7.85sResponse Time (max)22.30sResponse Time (total)31.40s
Coding
: 3.1 A test is fully passed only if every run passed for that test.API error: 2Response Time (avg)62.38sResponse Time (max)62.38sResponse Time (total)62.38s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)87.80sResponse Time (max)87.80sResponse Time (total)87.80s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.16sResponse Time (max)20.65sResponse Time (total)36.33s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)16.19sResponse Time (max)21.56sResponse Time (total)32.39s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)39.75sResponse Time (max)39.75sResponse Time (total)39.75s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)55.32sResponse Time (max)55.32sResponse Time (total)55.32s
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 2Response Time (avg)3.38sResponse Time (max)46.00sResponse Time (total)67.55s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.59sResponse Time (max)3.60sResponse Time (total)6.38s
Coding
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.14sResponse Time (max)3.44sResponse Time (total)4.29s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)46.00sResponse Time (max)46.00sResponse Time (total)46.00s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.01sResponse Time (max)1.06sResponse Time (total)2.02s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)465msResponse Time (max)492msResponse Time (total)1.39s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)513msResponse Time (max)570msResponse Time (total)1.03s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.04sResponse Time (max)2.04sResponse Time (total)2.04s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)295msResponse Time (max)295msResponse Time (total)295ms
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.99sResponse Time (max)109.60sResponse Time (total)139.95s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.54sResponse Time (max)34.54sResponse Time (total)34.54s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.30sResponse Time (max)9.30sResponse Time (total)9.30s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)114.12sResponse Time (max)114.12sResponse Time (total)114.12s
Coding
: 5.1 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)2.75sResponse Time (max)3.79sResponse Time (total)5.50s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.96sResponse Time (max)5.96sResponse Time (total)5.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.76sResponse Time (max)2.60sResponse Time (total)3.51s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.10sResponse Time (max)3.58sResponse Time (total)6.30s
General Intelligence
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.33sResponse Time (max)2.33sResponse Time (total)2.33s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.26sResponse Time (max)6.81sResponse Time (total)8.51s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.16sResponse Time (max)1.55sResponse Time (total)3.48s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.40sResponse Time (max)5.40sResponse Time (total)5.40s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.30sResponse Time (max)1.30sResponse Time (total)1.30s
A test is fully passed only if every run passed for that test.Wrong answer: 7Response Time (avg)1.70sResponse Time (max)3.56sResponse Time (total)22.05s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.25sResponse Time (max)1.59sResponse Time (total)2.49s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.19sResponse Time (max)2.79sResponse Time (total)4.38s
Combined
: 4.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.56sResponse Time (max)3.56sResponse Time (total)3.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.41sResponse Time (max)1.41sResponse Time (total)1.41s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)963msResponse Time (max)963msResponse Time (total)963ms
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.13sResponse Time (max)1.13sResponse Time (total)1.13s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.58sResponse Time (max)1.58sResponse Time (total)1.58s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.05sResponse Time (max)1.06sResponse Time (total)2.11s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.35sResponse Time (max)3.35sResponse Time (total)3.35s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.07sResponse Time (max)1.07sResponse Time (total)1.07s
Coding
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)8.27sResponse Time (max)14.69sResponse Time (total)16.54s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)25.49sResponse Time (max)25.49sResponse Time (total)25.49s
Data parsing and extraction
: 6.9 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)30.54sResponse Time (max)58.65sResponse Time (total)61.08s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.17sResponse Time (max)6.59sResponse Time (total)9.52s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.23sResponse Time (max)13.43sResponse Time (total)16.45s
Puzzle Solving
: 7.6 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)15.95sResponse Time (max)27.12sResponse Time (total)47.86s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.92sResponse Time (max)5.92sResponse Time (total)5.92s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.59sResponse Time (max)15.59sResponse Time (total)15.59s
Anti-AI Tricks
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.19sResponse Time (max)2.73sResponse Time (total)4.76s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.87sResponse Time (max)2.87sResponse Time (total)2.87s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)564msResponse Time (max)564msResponse Time (total)564ms
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)857msResponse Time (max)955msResponse Time (total)1.71s
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.86sResponse Time (max)2.70sResponse Time (total)3.71s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.28sResponse Time (max)2.28sResponse Time (total)2.28s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.82sResponse Time (max)1.82sResponse Time (total)1.82s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)2.85sResponse Time (max)11.91sResponse Time (total)57.08s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.12sResponse Time (max)3.18sResponse Time (total)8.50s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.56sResponse Time (max)2.20sResponse Time (total)3.13s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)11.91sResponse Time (max)11.91sResponse Time (total)11.91s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.00sResponse Time (max)3.74sResponse Time (total)5.99s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.36sResponse Time (max)3.51sResponse Time (total)7.07s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.49sResponse Time (max)1.66sResponse Time (total)2.99s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.69sResponse Time (max)1.89sResponse Time (total)5.08s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.54sResponse Time (max)9.54sResponse Time (total)9.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.35sResponse Time (max)1.35sResponse Time (total)1.35s
A test is fully passed only if every run passed for that test.Wrong answer: 11Response Time (avg)3.95sResponse Time (max)11.07sResponse Time (total)51.38s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.37sResponse Time (max)3.39sResponse Time (total)4.75s
Coding
: 4.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)5.18sResponse Time (max)8.84sResponse Time (total)10.37s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.98sResponse Time (max)4.98sResponse Time (total)4.98s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.78sResponse Time (max)5.78sResponse Time (total)5.78s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.24sResponse Time (max)2.24sResponse Time (total)2.24s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.27sResponse Time (max)3.27sResponse Time (total)3.27s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.48sResponse Time (max)1.48sResponse Time (total)1.48s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.91sResponse Time (max)2.08sResponse Time (total)3.82s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.07sResponse Time (max)11.07sResponse Time (total)11.07s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.62sResponse Time (max)3.62sResponse Time (total)3.62s
Anti-AI Tricks
: 9.2 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)43.33sResponse Time (max)71.76sResponse Time (total)173.31s
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)143.82sResponse Time (max)143.82sResponse Time (total)143.82s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)73.40sResponse Time (max)90.09sResponse Time (total)220.20s
General Intelligence
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.63sResponse Time (max)15.63sResponse Time (total)15.63s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)27.36sResponse Time (max)40.24sResponse Time (total)54.72s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)31.47sResponse Time (max)46.84sResponse Time (total)94.41s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)133.60sResponse Time (max)133.60sResponse Time (total)133.60s