A test is fully passed only if every run passed for that test.Wrong answer: 15Did not follow instructions: 1Response Time (avg)614msResponse Time (max)1.27sResponse Time (total)12.28s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)483msResponse Time (max)716msResponse Time (total)1.93s
Coding
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)831msResponse Time (max)969msResponse Time (total)1.66s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)606msResponse Time (max)606msResponse Time (total)606ms
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)667msResponse Time (max)819msResponse Time (total)1.33s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)534msResponse Time (max)733msResponse Time (total)1.60s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)551msResponse Time (max)622msResponse Time (total)1.10s
Puzzle Solving
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)535msResponse Time (max)642msResponse Time (total)1.60s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)548msResponse Time (max)548msResponse Time (total)548ms
A test is fully passed only if every run passed for that test.Wrong answer: 14Did not follow instructions: 1Response Time (avg)629msResponse Time (max)1.72sResponse Time (total)12.59s…
Anti-AI Tricks
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)395msResponse Time (max)769msResponse Time (total)1.58s
Coding
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.03sResponse Time (max)1.28sResponse Time (total)2.07s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.72sResponse Time (max)1.72sResponse Time (total)1.72s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)822msResponse Time (max)1.08sResponse Time (total)1.64s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)367msResponse Time (max)388msResponse Time (total)1.10s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)729msResponse Time (max)729msResponse Time (total)729ms
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)380msResponse Time (max)380msResponse Time (total)759ms
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.40sResponse Time (max)1.40sResponse Time (total)1.40s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)397msResponse Time (max)397msResponse Time (total)397ms
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)1.88sResponse Time (max)1.88sResponse Time (total)1.88s
Data parsing and extraction
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)575msResponse Time (max)583msResponse Time (total)1.15s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)357msResponse Time (max)463msResponse Time (total)1.07s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)499msResponse Time (max)499msResponse Time (total)499ms
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.17sResponse Time (max)2.17sResponse Time (total)2.17s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)306msResponse Time (max)306msResponse Time (total)306ms
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)584msResponse Time (max)772msResponse Time (total)1.75s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 3.8 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.42sResponse Time (max)2.21sResponse Time (total)2.84s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)489msResponse Time (max)513msResponse Time (total)1.47s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 3.3 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)471msResponse Time (max)872msResponse Time (total)1.41s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)714msResponse Time (max)987msResponse Time (total)1.43s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)287msResponse Time (max)334msResponse Time (total)860ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)752msResponse Time (max)1.22sResponse Time (total)1.50s
Puzzle Solving
: 3.8 A test is fully passed only if every run passed for that test.Wrong answer: 2API error: 1Response Time (avg)1.78sResponse Time (max)3.15sResponse Time (total)5.34s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Coding
: 2.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.01sResponse Time (max)2.01sResponse Time (total)2.01s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)646msResponse Time (max)658msResponse Time (total)1.29s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)371msResponse Time (max)419msResponse Time (total)1.11s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)439msResponse Time (max)448msResponse Time (total)878ms
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)650msResponse Time (max)843msResponse Time (total)1.30s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)1.93sResponse Time (max)1.93sResponse Time (total)1.93s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 11Response Time (avg)889msResponse Time (max)4.39sResponse Time (total)17.79s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)582msResponse Time (max)844msResponse Time (total)2.33s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)810msResponse Time (max)1.16sResponse Time (total)1.62s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.39sResponse Time (max)4.39sResponse Time (total)4.39s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)652msResponse Time (max)660msResponse Time (total)1.30s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)495msResponse Time (max)642msResponse Time (total)1.49s
General Intelligence
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)615msResponse Time (max)615msResponse Time (total)615ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)590msResponse Time (max)622msResponse Time (total)1.18s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)604msResponse Time (max)700msResponse Time (total)1.81s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.91sResponse Time (max)1.91sResponse Time (total)1.91s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.15sResponse Time (max)1.15sResponse Time (total)1.15s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 1Response Time (avg)1.09sResponse Time (max)2.97sResponse Time (total)21.79s…
Anti-AI Tricks
: 7.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.07sResponse Time (max)1.91sResponse Time (total)4.27s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.13sResponse Time (max)1.59sResponse Time (total)2.26s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.73sResponse Time (max)2.73sResponse Time (total)2.73s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)843msResponse Time (max)907msResponse Time (total)1.69s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)762msResponse Time (max)814msResponse Time (total)2.29s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)992msResponse Time (max)992msResponse Time (total)992ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)859msResponse Time (max)975msResponse Time (total)1.72s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.97sResponse Time (max)2.97sResponse Time (total)2.97s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)733msResponse Time (max)733msResponse Time (total)733ms
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)501msResponse Time (max)839msResponse Time (total)2.01s
Coding
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.22sResponse Time (max)1.22sResponse Time (total)1.22s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)6.04sResponse Time (max)6.04sResponse Time (total)6.04s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)522msResponse Time (max)537msResponse Time (total)1.04s
General Intelligence
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)659msResponse Time (max)659msResponse Time (total)659ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)445msResponse Time (max)505msResponse Time (total)889ms
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)473msResponse Time (max)502msResponse Time (total)1.42s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.63sResponse Time (max)4.63sResponse Time (total)4.63s
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 3Response Time (avg)1.15sResponse Time (max)2.52sResponse Time (total)23.09s…
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)929msResponse Time (max)1.55sResponse Time (total)3.72s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.01sResponse Time (max)1.19sResponse Time (total)2.02s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.52sResponse Time (max)2.52sResponse Time (total)2.52s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.30sResponse Time (max)1.58sResponse Time (total)2.61s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)937msResponse Time (max)1.25sResponse Time (total)2.81s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)728msResponse Time (max)731msResponse Time (total)1.46s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.32sResponse Time (max)2.32sResponse Time (total)2.32s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.33sResponse Time (max)1.33sResponse Time (total)1.33s
Anti-AI Tricks
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)597msResponse Time (max)866msResponse Time (total)2.39s
Coding
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.14sResponse Time (max)1.14sResponse Time (total)1.14s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)6.48sResponse Time (max)6.48sResponse Time (total)6.48s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)601msResponse Time (max)634msResponse Time (total)1.20s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)611msResponse Time (max)616msResponse Time (total)1.83s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)649msResponse Time (max)952msResponse Time (total)1.30s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)586msResponse Time (max)813msResponse Time (total)1.76s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.79sResponse Time (max)4.79sResponse Time (total)4.79s
Coding
: 4.7 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)1.39sResponse Time (max)1.39sResponse Time (total)1.39s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.81sResponse Time (max)3.81sResponse Time (total)3.81s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.04sResponse Time (max)1.05sResponse Time (total)2.08s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)927msResponse Time (max)1.17sResponse Time (total)2.78s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.03sResponse Time (max)1.17sResponse Time (total)2.07s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)2.79sResponse Time (max)2.79sResponse Time (total)2.79s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 2Response Time (avg)1.23sResponse Time (max)3.39sResponse Time (total)24.68s…
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.06sResponse Time (max)1.47sResponse Time (total)2.13s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.20sResponse Time (max)3.20sResponse Time (total)3.20s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.22sResponse Time (max)1.33sResponse Time (total)2.44s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)942msResponse Time (max)1.12sResponse Time (total)2.83s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.13sResponse Time (max)1.14sResponse Time (total)2.27s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)900msResponse Time (max)962msResponse Time (total)2.70s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.39sResponse Time (max)3.39sResponse Time (total)3.39s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)814msResponse Time (max)814msResponse Time (total)814ms
Anti-AI Tricks
: 6.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.19sResponse Time (max)2.04sResponse Time (total)4.75s
Coding
: 4.0 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)1.30sResponse Time (max)1.30sResponse Time (total)1.30s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.70sResponse Time (max)3.70sResponse Time (total)3.70s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)979msResponse Time (max)1.02sResponse Time (total)1.96s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)925msResponse Time (max)1.16sResponse Time (total)2.77s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)987msResponse Time (max)1.13sResponse Time (total)1.97s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)2.83sResponse Time (max)2.83sResponse Time (total)2.83s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 6Response Time (avg)1.30sResponse Time (max)3.92sResponse Time (total)25.95s…
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.08sResponse Time (max)1.39sResponse Time (total)4.30s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.39sResponse Time (max)1.63sResponse Time (total)2.78s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.17sResponse Time (max)2.17sResponse Time (total)2.17s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.35sResponse Time (max)1.43sResponse Time (total)2.69s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)975msResponse Time (max)1.08sResponse Time (total)2.92s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.04sResponse Time (max)1.04sResponse Time (total)1.04s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)943msResponse Time (max)974msResponse Time (total)1.89s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.13sResponse Time (max)1.29sResponse Time (total)3.40s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.92sResponse Time (max)3.92sResponse Time (total)3.92s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)856msResponse Time (max)856msResponse Time (total)856ms
A test is fully passed only if every run passed for that test.Wrong answer: 14Did not follow instructions: 2Response Time (avg)1.33sResponse Time (max)3.84sResponse Time (total)26.54s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.18sResponse Time (max)1.81sResponse Time (total)4.70s
Coding
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.09sResponse Time (max)1.43sResponse Time (total)2.18s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.84sResponse Time (max)3.84sResponse Time (total)3.84s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.25sResponse Time (total)2.23s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)926msResponse Time (max)959msResponse Time (total)2.78s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)784msResponse Time (max)859msResponse Time (total)1.57s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.40sResponse Time (max)3.40sResponse Time (total)3.40s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)773msResponse Time (max)773msResponse Time (total)773ms
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 3Response Time (avg)1.37sResponse Time (max)4.49sResponse Time (total)27.32s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.10sResponse Time (max)1.65sResponse Time (total)4.42s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)951msResponse Time (max)1.31sResponse Time (total)1.90s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.53sResponse Time (max)2.53sResponse Time (total)2.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.04sResponse Time (max)1.32sResponse Time (total)2.07s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.02sResponse Time (max)1.16sResponse Time (total)3.06s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)932msResponse Time (max)1.00sResponse Time (total)1.86s
Puzzle Solving
: 6.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 2Response Time (avg)2.15sResponse Time (max)4.49sResponse Time (total)6.45s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.51sResponse Time (max)3.51sResponse Time (total)3.51s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)724msResponse Time (max)724msResponse Time (total)724ms
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 1Response Time (avg)1.45sResponse Time (max)2.95sResponse Time (total)29.00s…
Anti-AI Tricks
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.21sResponse Time (max)2.58sResponse Time (total)4.85s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.99sResponse Time (max)2.95sResponse Time (total)3.97s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.89sResponse Time (max)2.89sResponse Time (total)2.89s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.04sResponse Time (max)1.06sResponse Time (total)2.08s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.07sResponse Time (max)1.54sResponse Time (total)3.22s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.78sResponse Time (max)1.78sResponse Time (total)1.78s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.07sResponse Time (max)1.17sResponse Time (total)2.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)2.75sResponse Time (total)2.75s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)990msResponse Time (max)990msResponse Time (total)990ms
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)892msResponse Time (max)1.38sResponse Time (total)3.57s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)3.39sResponse Time (max)5.51sResponse Time (total)6.79s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)3.56sResponse Time (max)3.56sResponse Time (total)3.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.66sResponse Time (max)2.11sResponse Time (total)3.32s
Domain specific
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)899msResponse Time (max)1.04sResponse Time (total)2.70s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)922msResponse Time (max)922msResponse Time (total)922ms
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)893msResponse Time (max)964msResponse Time (total)1.79s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.45sResponse Time (max)2.30sResponse Time (total)4.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.79sResponse Time (max)2.79sResponse Time (total)2.79s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.76sResponse Time (max)1.76sResponse Time (total)1.76s
A test is fully passed only if every run passed for that test.Wrong answer: 13Did not follow instructions: 3Response Time (avg)1.62sResponse Time (max)5.51sResponse Time (total)19.48s…
Coding
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.79sResponse Time (max)1.79sResponse Time (total)1.79s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.33sResponse Time (max)3.33sResponse Time (total)3.33s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)943msResponse Time (max)943msResponse Time (total)943ms
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.06sResponse Time (max)1.06sResponse Time (total)1.06s
Puzzle Solving
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.10sResponse Time (max)1.36sResponse Time (total)2.21s
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.51sResponse Time (max)5.51sResponse Time (total)5.51s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)731msResponse Time (max)731msResponse Time (total)731ms
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.63sResponse Time (max)4.60sResponse Time (total)6.51s
Coding
: 6.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.34sResponse Time (max)2.46sResponse Time (total)4.68s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)4.22sResponse Time (max)4.22sResponse Time (total)4.22s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.13sResponse Time (max)3.35sResponse Time (total)4.26s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.11sResponse Time (max)1.89sResponse Time (total)3.32s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)947msResponse Time (max)947msResponse Time (total)947ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.10sResponse Time (max)1.36sResponse Time (total)2.19s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.49sResponse Time (max)2.49sResponse Time (total)2.49s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)649msResponse Time (max)649msResponse Time (total)649ms
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.71sResponse Time (max)3.79sResponse Time (total)6.84s
Coding
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)5.39sResponse Time (max)5.69sResponse Time (total)10.78s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)5.91sResponse Time (max)5.91sResponse Time (total)5.91s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)847msResponse Time (max)1.09sResponse Time (total)1.69s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)464msResponse Time (max)622msResponse Time (total)1.39s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)514msResponse Time (max)582msResponse Time (total)1.03s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.32sResponse Time (max)2.32sResponse Time (total)2.32s
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)1.69sResponse Time (max)9.39sResponse Time (total)33.82s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)788msResponse Time (max)1.34sResponse Time (total)3.15s
Coding
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.98sResponse Time (max)2.51sResponse Time (total)3.97s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.39sResponse Time (max)9.39sResponse Time (total)9.39s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.43sResponse Time (max)1.45sResponse Time (total)2.86s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)540msResponse Time (max)649msResponse Time (total)1.62s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.03sResponse Time (max)1.40sResponse Time (total)2.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)599msResponse Time (max)599msResponse Time (total)599ms
A test is fully passed only if every run passed for that test.Wrong answer: 7Response Time (avg)1.70sResponse Time (max)3.56sResponse Time (total)22.05s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.25sResponse Time (max)1.59sResponse Time (total)2.49s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.19sResponse Time (max)2.79sResponse Time (total)4.38s
Combined
: 4.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.56sResponse Time (max)3.56sResponse Time (total)3.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.41sResponse Time (max)1.41sResponse Time (total)1.41s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)963msResponse Time (max)963msResponse Time (total)963ms
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.13sResponse Time (max)1.13sResponse Time (total)1.13s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.58sResponse Time (max)1.58sResponse Time (total)1.58s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.05sResponse Time (max)1.06sResponse Time (total)2.11s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.35sResponse Time (max)3.35sResponse Time (total)3.35s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.07sResponse Time (max)1.07sResponse Time (total)1.07s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 4Response Time (avg)1.84sResponse Time (max)8.32sResponse Time (total)36.79s…
Combined
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.32sResponse Time (max)1.42sResponse Time (total)2.64s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)877msResponse Time (max)904msResponse Time (total)2.63s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.58sResponse Time (max)2.58sResponse Time (total)2.58s
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.03sResponse Time (max)1.10sResponse Time (total)2.06s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.30sResponse Time (max)3.30sResponse Time (total)3.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
A test is fully passed only if every run passed for that test.Wrong answer: 14Did not follow instructions: 1Response Time (avg)1.85sResponse Time (max)7.58sResponse Time (total)24.00s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.34sResponse Time (max)1.83sResponse Time (total)2.67s
Coding
: 3.2 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.05sResponse Time (max)2.55sResponse Time (total)4.10s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)7.58sResponse Time (max)7.58sResponse Time (total)7.58s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.27sResponse Time (max)1.27sResponse Time (total)1.27s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)637msResponse Time (max)637msResponse Time (total)637ms
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)909msResponse Time (max)909msResponse Time (total)909ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.11sResponse Time (total)1.11s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.51sResponse Time (max)2.51sResponse Time (total)2.51s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)794msResponse Time (max)794msResponse Time (total)794ms
A test is fully passed only if every run passed for that test.Wrong answer: 8Response Time (avg)1.92sResponse Time (max)5.66sResponse Time (total)38.45s…
Anti-AI Tricks
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.84sResponse Time (max)3.08sResponse Time (total)7.37s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.71sResponse Time (max)1.97sResponse Time (total)3.42s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.48sResponse Time (max)4.48sResponse Time (total)4.48s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.44sResponse Time (max)1.51sResponse Time (total)2.89s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.52sResponse Time (max)1.63sResponse Time (total)4.57s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.37sResponse Time (max)1.37sResponse Time (total)1.37s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.52sResponse Time (max)1.68sResponse Time (total)3.04s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.40sResponse Time (max)1.41sResponse Time (total)4.20s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.66sResponse Time (max)5.66sResponse Time (total)5.66s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.46sResponse Time (max)1.46sResponse Time (total)1.46s
A test is fully passed only if every run passed for that test.Wrong answer: 10Response Time (avg)1.93sResponse Time (max)5.56sResponse Time (total)38.64s…
Anti-AI Tricks
: 6.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.31sResponse Time (max)2.08sResponse Time (total)5.25s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.52sResponse Time (max)2.05sResponse Time (total)3.04s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.56sResponse Time (max)5.56sResponse Time (total)5.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.18sResponse Time (max)1.24sResponse Time (total)2.37s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.31sResponse Time (max)1.39sResponse Time (total)3.92s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.41sResponse Time (max)3.41sResponse Time (total)3.41s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.15sResponse Time (max)1.19sResponse Time (total)2.31s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.29sResponse Time (max)1.56sResponse Time (total)3.87s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.90sResponse Time (max)3.90sResponse Time (total)3.90s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.01sResponse Time (max)5.01sResponse Time (total)5.01s
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.19sResponse Time (max)6.85sResponse Time (total)8.74s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.74sResponse Time (max)5.52sResponse Time (total)7.47s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.36sResponse Time (max)2.36sResponse Time (total)2.36s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)1.01sResponse Time (max)1.18sResponse Time (total)2.03s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)756msResponse Time (max)877msResponse Time (total)2.27s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.86sResponse Time (max)6.86sResponse Time (total)6.86s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)751msResponse Time (max)821msResponse Time (total)1.50s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)2.43sResponse Time (total)2.43s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.89sResponse Time (max)3.89sResponse Time (total)3.89s
A test is fully passed only if every run passed for that test.Wrong answer: 8Did not follow instructions: 3Response Time (avg)2.27sResponse Time (max)14.63sResponse Time (total)43.20s…
Coding
: 7.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.29sResponse Time (max)3.06sResponse Time (total)4.58s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.28sResponse Time (max)3.28sResponse Time (total)3.28s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.47sResponse Time (total)2.21s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)6.48sResponse Time (max)14.63sResponse Time (total)19.43s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.07sResponse Time (max)1.07sResponse Time (total)1.07s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.58sResponse Time (max)2.58sResponse Time (total)2.58s