Anti-AI Tricks
: 9.2 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.23sResponse Time (max)29.86sResponse Time (total)96.93s
Coding
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)180.92sResponse Time (max)180.92sResponse Time (total)180.92s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)93.11sResponse Time (max)93.11sResponse Time (total)93.11s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)36.09sResponse Time (max)39.12sResponse Time (total)72.18s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)24.27sResponse Time (max)33.91sResponse Time (total)72.82s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)35.78sResponse Time (max)47.30sResponse Time (total)71.56s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.81sResponse Time (max)34.81sResponse Time (total)34.81s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)83.99sResponse Time (max)83.99sResponse Time (total)83.99s
A test is fully passed only if every run passed for that test.Wrong answer: 5Did not follow instructions: 4Response Time (avg)16.00sResponse Time (max)102.91sResponse Time (total)303.99sโฆ
Anti-AI Tricks
: 8.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.05sResponse Time (max)6.69sResponse Time (total)16.20s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.76sResponse Time (max)7.76sResponse Time (total)7.76s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.81sResponse Time (max)17.81sResponse Time (total)17.81s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.43sResponse Time (max)3.39sResponse Time (total)4.87s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)65.31sResponse Time (max)102.91sResponse Time (total)195.92s
Puzzle Solving
: 7.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)4.33sResponse Time (max)7.27sResponse Time (total)13.00s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)9.62sResponse Time (max)9.62sResponse Time (total)9.62s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)30.10sResponse Time (max)30.10sResponse Time (total)30.10s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.67sResponse Time (max)3.67sResponse Time (total)3.67s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.84sResponse Time (max)23.84sResponse Time (total)23.84s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.43sResponse Time (max)3.43sResponse Time (total)3.43s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.54sResponse Time (max)3.54sResponse Time (total)3.54s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)2.92sResponse Time (max)3.33sResponse Time (total)5.84s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.11sResponse Time (max)4.11sResponse Time (total)4.11s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.67sResponse Time (max)4.67sResponse Time (total)4.67s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)4.59sResponse Time (total)10.98s
Coding
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)68.55sResponse Time (max)68.55sResponse Time (total)68.55s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)25.87sResponse Time (max)25.87sResponse Time (total)25.87s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.04sResponse Time (max)4.12sResponse Time (total)6.07s
General Intelligence
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.61sResponse Time (max)3.61sResponse Time (total)3.61s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)13.98sResponse Time (max)13.98sResponse Time (total)13.98s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)234.19sResponse Time (max)234.19sResponse Time (total)234.19s
Anti-AI Tricks
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.87sResponse Time (max)6.30sResponse Time (total)14.62s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)35.61sResponse Time (max)35.61sResponse Time (total)35.61s
Combined
: 3.0 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)53.14sResponse Time (max)53.14sResponse Time (total)53.14s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.93sResponse Time (max)5.03sResponse Time (total)9.86s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)24.14sResponse Time (max)45.83sResponse Time (total)72.43s
General Intelligence
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.30sResponse Time (max)6.00sResponse Time (total)8.59s
Puzzle Solving
: 3.8 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)7.57sResponse Time (max)9.69sResponse Time (total)15.14s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.31sResponse Time (max)6.31sResponse Time (total)6.31s
Trivia
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.85sResponse Time (max)4.45sResponse Time (total)7.40s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)26.13sResponse Time (max)26.13sResponse Time (total)26.13s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.25sResponse Time (max)3.02sResponse Time (total)4.51s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.22sResponse Time (max)4.68sResponse Time (total)9.67s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.09sResponse Time (max)2.09sResponse Time (total)2.09s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.84sResponse Time (max)4.45sResponse Time (total)5.68s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.25sResponse Time (max)1.25sResponse Time (total)1.25s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 2Response Time (avg)58.93sResponse Time (max)358.35sResponse Time (total)1119.75sโฆ
Anti-AI Tricks
: 7.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)16.53sResponse Time (max)39.91sResponse Time (total)66.11s
Coding
: 2.6 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)51.77sResponse Time (max)51.77sResponse Time (total)51.77s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.02sResponse Time (max)65.02sResponse Time (total)65.02s
Data parsing and extraction
: 8.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.62sResponse Time (max)36.44sResponse Time (total)47.24s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)205.66sResponse Time (max)358.35sResponse Time (total)616.97s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)41.16sResponse Time (max)43.56sResponse Time (total)82.32s
Puzzle Solving
: 7.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)34.92sResponse Time (max)76.46sResponse Time (total)104.76s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.33sResponse Time (max)21.33sResponse Time (total)21.33s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)39.14sResponse Time (max)39.14sResponse Time (total)39.14s
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.95sResponse Time (max)5.68sResponse Time (total)15.80s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)24.33sResponse Time (max)24.33sResponse Time (total)24.33s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.40sResponse Time (max)17.40sResponse Time (total)17.40s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.17sResponse Time (max)5.02sResponse Time (total)8.34s
Instructions following
: 7.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)4.42sResponse Time (max)4.46sResponse Time (total)8.84s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.20sResponse Time (max)11.63sResponse Time (total)18.61s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)13.68sResponse Time (max)13.68sResponse Time (total)13.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)63.48sResponse Time (max)63.48sResponse Time (total)63.48s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.18sResponse Time (max)23.18sResponse Time (total)23.18s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)88.15sResponse Time (max)88.15sResponse Time (total)88.15s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)12.58sResponse Time (max)13.87sResponse Time (total)25.16s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)44.63sResponse Time (max)82.55sResponse Time (total)133.89s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.64sResponse Time (max)18.64sResponse Time (total)18.64s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.99sResponse Time (max)9.99sResponse Time (total)9.99s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 3Response Time (avg)1.41sResponse Time (max)4.49sResponse Time (total)26.72sโฆ
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.10sResponse Time (max)1.65sResponse Time (total)4.42s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.31sResponse Time (max)1.31sResponse Time (total)1.31s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.53sResponse Time (max)2.53sResponse Time (total)2.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.04sResponse Time (max)1.32sResponse Time (total)2.07s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.02sResponse Time (max)1.16sResponse Time (total)3.06s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)932msResponse Time (max)1.00sResponse Time (total)1.86s
Puzzle Solving
: 6.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 2Response Time (avg)2.15sResponse Time (max)4.49sResponse Time (total)6.45s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.51sResponse Time (max)3.51sResponse Time (total)3.51s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)724msResponse Time (max)724msResponse Time (total)724ms
Anti-AI Tricks
: 7.3 A test is fully passed only if every run passed for that test.No answer: 1Wrong answer: 1Response Time (avg)51.38sResponse Time (max)85.28sResponse Time (total)102.75s
Coding
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)150.77sResponse Time (max)150.77sResponse Time (total)150.77s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)71.37sResponse Time (max)71.37sResponse Time (total)71.37s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)49.78sResponse Time (max)49.78sResponse Time (total)49.78s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)137.29sResponse Time (max)137.29sResponse Time (total)137.29s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)92.47sResponse Time (max)92.47sResponse Time (total)92.47s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)31.74sResponse Time (max)31.74sResponse Time (total)31.74s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)83.95sResponse Time (max)83.95sResponse Time (total)83.95s
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.62sResponse Time (max)18.61sResponse Time (total)50.50s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)168.22sResponse Time (max)168.22sResponse Time (total)168.22s
Combined
: 7.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)83.07sResponse Time (max)83.07sResponse Time (total)83.07s
Data parsing and extraction
: 3.5 A test is fully passed only if every run passed for that test.No answer: 2Response Time (avg)37.30sResponse Time (max)54.01sResponse Time (total)74.60s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)73.38sResponse Time (max)101.55sResponse Time (total)220.15s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.96sResponse Time (max)47.48sResponse Time (total)75.92s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)60.21sResponse Time (max)97.76sResponse Time (total)180.63s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.88sResponse Time (max)16.88sResponse Time (total)16.88s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)80.99sResponse Time (max)80.99sResponse Time (total)80.99s
Anti-AI Tricks
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)4.75sResponse Time (max)7.62sResponse Time (total)19.00s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)30.53sResponse Time (max)30.53sResponse Time (total)30.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)23.16sResponse Time (max)26.55sResponse Time (total)46.33s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.18sResponse Time (max)4.46sResponse Time (total)8.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.33sResponse Time (max)17.33sResponse Time (total)17.33s
A test is fully passed only if every run passed for that test.Wrong answer: 9Response Time (avg)1.99sResponse Time (max)5.56sResponse Time (total)37.87sโฆ
Anti-AI Tricks
: 6.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.31sResponse Time (max)2.08sResponse Time (total)5.25s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.05sResponse Time (max)2.05sResponse Time (total)2.05s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.56sResponse Time (max)5.56sResponse Time (total)5.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.18sResponse Time (max)1.24sResponse Time (total)2.37s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.31sResponse Time (max)1.39sResponse Time (total)3.92s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.41sResponse Time (max)3.41sResponse Time (total)3.41s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.15sResponse Time (max)1.19sResponse Time (total)2.31s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.36sResponse Time (max)1.56sResponse Time (total)4.09s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.90sResponse Time (max)3.90sResponse Time (total)3.90s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.01sResponse Time (max)5.01sResponse Time (total)5.01s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 1Response Time (avg)1.11sResponse Time (max)2.97sResponse Time (total)21.13sโฆ
Anti-AI Tricks
: 7.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.07sResponse Time (max)1.91sResponse Time (total)4.27s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.59sResponse Time (max)1.59sResponse Time (total)1.59s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.73sResponse Time (max)2.73sResponse Time (total)2.73s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)843msResponse Time (max)907msResponse Time (total)1.69s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)762msResponse Time (max)814msResponse Time (total)2.29s
General Intelligence
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)992msResponse Time (max)992msResponse Time (total)992ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)859msResponse Time (max)975msResponse Time (total)1.72s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.97sResponse Time (max)2.97sResponse Time (total)2.97s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)733msResponse Time (max)733msResponse Time (total)733ms
Anti-AI Tricks
: 6.9 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)3.46sResponse Time (max)4.38sResponse Time (total)13.86s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)27.11sResponse Time (max)27.11sResponse Time (total)27.11s
Combined
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.54sResponse Time (max)7.51sResponse Time (total)11.08s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.63sResponse Time (max)5.46sResponse Time (total)9.26s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 6No answer: 2Invalid tool call: 1Response Time (avg)6.74sResponse Time (max)29.11sResponse Time (total)101.08sโฆ
Anti-AI Tricks
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.68sResponse Time (max)3.09sResponse Time (total)8.04s
Coding
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)14.36sResponse Time (max)14.36sResponse Time (total)14.36s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.92sResponse Time (max)15.92sResponse Time (total)15.92s
Data parsing and extraction
: 7.1 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)9.34sResponse Time (max)16.71sResponse Time (total)18.68s
Domain specific
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)11.12sResponse Time (max)29.11sResponse Time (total)33.35s
General Intelligence
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.68sResponse Time (max)2.03sResponse Time (total)3.36s
Puzzle Solving
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.99sResponse Time (max)2.00sResponse Time (total)3.97s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)3.39sResponse Time (max)3.39sResponse Time (total)3.39s
Trivia
: 0.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
A test is fully passed only if every run passed for that test.Wrong answer: 10Response Time (avg)2.49sResponse Time (max)6.65sResponse Time (total)32.33sโฆ
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.91sResponse Time (max)2.74sResponse Time (total)3.82s
Coding
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.63sResponse Time (max)3.63sResponse Time (total)3.63s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.65sResponse Time (max)6.65sResponse Time (total)6.65s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.17sResponse Time (max)1.44sResponse Time (total)2.33s
General Intelligence
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.26sResponse Time (max)2.26sResponse Time (total)2.26s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.67sResponse Time (max)1.67sResponse Time (total)1.67s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.82sResponse Time (max)3.52sResponse Time (total)5.65s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.33sResponse Time (max)3.33sResponse Time (total)3.33s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.11sResponse Time (total)1.11s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.81sResponse Time (max)5.65sResponse Time (total)7.62s
Coding
: 2.3 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)23.58sResponse Time (max)23.58sResponse Time (total)23.58s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.64sResponse Time (max)37.64sResponse Time (total)37.64s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.63sResponse Time (max)6.63sResponse Time (total)6.63s
Domain specific
: 5.8 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)121.79sResponse Time (max)121.79sResponse Time (total)121.79s
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)27.71sResponse Time (max)27.71sResponse Time (total)27.71s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.52sResponse Time (max)25.52sResponse Time (total)25.52s
A test is fully passed only if every run passed for that test.Wrong answer: 10Response Time (avg)4.18sResponse Time (max)11.07sResponse Time (total)50.12sโฆ
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.37sResponse Time (max)3.39sResponse Time (total)4.75s
Coding
: 5.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.84sResponse Time (max)8.84sResponse Time (total)8.84s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.98sResponse Time (max)4.98sResponse Time (total)4.98s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.78sResponse Time (max)5.78sResponse Time (total)5.78s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.24sResponse Time (max)2.24sResponse Time (total)2.24s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.27sResponse Time (max)3.27sResponse Time (total)3.27s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.48sResponse Time (max)1.48sResponse Time (total)1.48s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.05sResponse Time (max)2.08sResponse Time (total)4.10s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.07sResponse Time (max)11.07sResponse Time (total)11.07s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.62sResponse Time (max)3.62sResponse Time (total)3.62s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 1Response Time (avg)2.37sResponse Time (max)6.81sResponse Time (total)45.03sโฆ
Coding
: 6.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.72sResponse Time (max)1.72sResponse Time (total)1.72s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.96sResponse Time (max)5.96sResponse Time (total)5.96s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.76sResponse Time (max)2.60sResponse Time (total)3.51s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.10sResponse Time (max)3.58sResponse Time (total)6.30s
General Intelligence
: 4.1 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.33sResponse Time (max)2.33sResponse Time (total)2.33s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.26sResponse Time (max)6.81sResponse Time (total)8.51s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.16sResponse Time (max)1.55sResponse Time (total)3.48s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.40sResponse Time (max)5.40sResponse Time (total)5.40s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.30sResponse Time (max)1.30sResponse Time (total)1.30s
A test is fully passed only if every run passed for that test.Wrong answer: 8Did not follow instructions: 3Response Time (avg)2.23sResponse Time (max)14.63sResponse Time (total)40.10sโฆ
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.53sResponse Time (max)1.53sResponse Time (total)1.53s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.28sResponse Time (max)3.28sResponse Time (total)3.28s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.47sResponse Time (total)2.21s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)6.48sResponse Time (max)14.63sResponse Time (total)19.43s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.07sResponse Time (max)1.07sResponse Time (total)1.07s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.58sResponse Time (max)2.58sResponse Time (total)2.58s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 1Response Time (avg)916msResponse Time (max)4.39sResponse Time (total)17.41sโฆ
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)582msResponse Time (max)844msResponse Time (total)2.33s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.16sResponse Time (max)1.16sResponse Time (total)1.16s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.39sResponse Time (max)4.39sResponse Time (total)4.39s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)652msResponse Time (max)660msResponse Time (total)1.30s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)495msResponse Time (max)642msResponse Time (total)1.49s
General Intelligence
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)615msResponse Time (max)615msResponse Time (total)615ms
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)672msResponse Time (max)785msResponse Time (total)1.34s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.91sResponse Time (max)1.91sResponse Time (total)1.91s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.15sResponse Time (max)1.15sResponse Time (total)1.15s
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.28sResponse Time (max)2.09sResponse Time (total)5.13s
Coding
: 4.7 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)7.07sResponse Time (max)7.07sResponse Time (total)7.07s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)30.53sResponse Time (max)30.53sResponse Time (total)30.53s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.70sResponse Time (max)2.21sResponse Time (total)3.41s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.49sResponse Time (max)4.23sResponse Time (total)7.48s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.08sResponse Time (max)1.65sResponse Time (total)2.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)57.10sResponse Time (max)57.10sResponse Time (total)57.10s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)778msResponse Time (max)778msResponse Time (total)778ms
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)25.50sResponse Time (max)37.73sResponse Time (total)51.00s
Coding
: 6.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)40.73sResponse Time (max)40.73sResponse Time (total)40.73s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.96sResponse Time (max)65.96sResponse Time (total)65.96s
Data parsing and extraction
: 3.7 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)21.42sResponse Time (max)21.42sResponse Time (total)21.42s
Domain specific
: 5.2 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)204.02sResponse Time (max)204.02sResponse Time (total)204.02s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.90sResponse Time (max)11.90sResponse Time (total)11.90s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)33.30sResponse Time (max)33.30sResponse Time (total)33.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)20.13sResponse Time (max)20.13sResponse Time (total)20.13s
Coding
: 7.1 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)14.69sResponse Time (max)14.69sResponse Time (total)14.69s
Combined
: 9.5 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)25.49sResponse Time (max)25.49sResponse Time (total)25.49s
Data parsing and extraction
: 8.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)30.54sResponse Time (max)58.65sResponse Time (total)61.08s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.17sResponse Time (max)6.59sResponse Time (total)9.52s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)8.23sResponse Time (max)13.43sResponse Time (total)16.45s
Puzzle Solving
: 7.6 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)19.72sResponse Time (max)38.42sResponse Time (total)59.15s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.92sResponse Time (max)5.92sResponse Time (total)5.92s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)15.59sResponse Time (max)15.59sResponse Time (total)15.59s
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)7.85sResponse Time (max)22.30sResponse Time (total)31.40s
Coding
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)87.80sResponse Time (max)87.80sResponse Time (total)87.80s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)18.16sResponse Time (max)20.65sResponse Time (total)36.33s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)16.19sResponse Time (max)21.56sResponse Time (total)32.39s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)39.75sResponse Time (max)39.75sResponse Time (total)39.75s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)55.32sResponse Time (max)55.32sResponse Time (total)55.32s
A test is fully passed only if every run passed for that test.Wrong answer: 11Response Time (avg)2.50sResponse Time (max)6.70sResponse Time (total)47.42sโฆ
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.43sResponse Time (max)6.70sResponse Time (total)9.73s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.61sResponse Time (max)4.61sResponse Time (total)4.61s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.59sResponse Time (max)6.59sResponse Time (total)6.59s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.82sResponse Time (max)1.97sResponse Time (total)3.63s
Domain specific
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.33sResponse Time (max)1.53sResponse Time (total)4.00s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.45sResponse Time (max)3.45sResponse Time (total)3.45s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.06sResponse Time (max)1.09sResponse Time (total)2.12s
Puzzle Solving
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.46sResponse Time (max)4.23sResponse Time (total)7.37s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.94sResponse Time (max)3.94sResponse Time (total)3.94s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 2Response Time (avg)3.06sResponse Time (max)6.51sResponse Time (total)58.10sโฆ
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.13sResponse Time (max)5.90sResponse Time (total)12.50s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.30sResponse Time (max)5.30sResponse Time (total)5.30s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.51sResponse Time (max)6.51sResponse Time (total)6.51s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.81sResponse Time (max)5.69sResponse Time (total)7.62s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.09sResponse Time (max)2.39sResponse Time (total)6.26s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.97sResponse Time (max)2.43sResponse Time (total)3.93s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.86sResponse Time (max)4.86sResponse Time (total)4.86s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.23sResponse Time (max)2.23sResponse Time (total)2.23s
A test is fully passed only if every run passed for that test.Wrong answer: 9Did not follow instructions: 2Response Time (avg)10.58sResponse Time (max)58.63sResponse Time (total)201.03sโฆ
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.97sResponse Time (max)7.48sResponse Time (total)15.89s
Coding
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.35sResponse Time (max)7.35sResponse Time (total)7.35s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)10.01sResponse Time (max)10.01sResponse Time (total)10.01s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.64sResponse Time (max)29.16sResponse Time (total)43.28s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)8.58sResponse Time (max)9.48sResponse Time (total)25.74s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)9.59sResponse Time (max)15.94sResponse Time (total)19.18s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.26sResponse Time (max)8.26sResponse Time (total)8.26s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.38sResponse Time (max)2.38sResponse Time (total)2.38s