A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)5.81sResponse Time (max)14.72sResponse Time (total)116.25s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.48sResponse Time (max)4.31sResponse Time (total)13.94s
Coding
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.66sResponse Time (max)6.94sResponse Time (total)13.31s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.27sResponse Time (max)3.27sResponse Time (total)3.27s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.40sResponse Time (max)14.72sResponse Time (total)18.80s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)8.05sResponse Time (max)14.40sResponse Time (total)24.15s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.68sResponse Time (max)3.68sResponse Time (total)3.68s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.02sResponse Time (max)7.35sResponse Time (total)14.03s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.77sResponse Time (max)10.27sResponse Time (total)17.32s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.99sResponse Time (max)4.99sResponse Time (total)4.99s
Trivia
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.75sResponse Time (max)2.75sResponse Time (total)2.75s
Coding
: 6.7 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)54.73sResponse Time (max)91.27sResponse Time (total)109.46s
Combined
: 4.7 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)41.03sResponse Time (max)41.03sResponse Time (total)41.03s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)21.95sResponse Time (max)24.88sResponse Time (total)43.89s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 2Wrong answer: 1Response Time (avg)19.00sResponse Time (max)21.63sResponse Time (total)38.01s
Tool Calling
: 4.7 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)12.05sResponse Time (max)12.05sResponse Time (total)12.05s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)22.77sResponse Time (max)22.77sResponse Time (total)22.77s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)892msResponse Time (max)1.38sResponse Time (total)3.57s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)3.39sResponse Time (max)5.51sResponse Time (total)6.79s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)3.56sResponse Time (max)3.56sResponse Time (total)3.56s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.66sResponse Time (max)2.11sResponse Time (total)3.32s
Domain specific
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)899msResponse Time (max)1.04sResponse Time (total)2.70s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)922msResponse Time (max)922msResponse Time (total)922ms
Instructions following
: 6.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)893msResponse Time (max)964msResponse Time (total)1.79s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.45sResponse Time (max)2.30sResponse Time (total)4.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.79sResponse Time (max)2.79sResponse Time (total)2.79s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.76sResponse Time (max)1.76sResponse Time (total)1.76s
A test is fully passed only if every run passed for that test.Wrong answer: 7Did not follow instructions: 2Response Time (avg)11.79sResponse Time (max)94.06sResponse Time (total)235.81s…
Anti-AI Tricks
: 8.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.52sResponse Time (max)7.74sResponse Time (total)18.10s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)21.10sResponse Time (max)28.80sResponse Time (total)42.21s
Combined
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.13sResponse Time (max)24.13sResponse Time (total)24.13s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.54sResponse Time (max)3.33sResponse Time (total)5.08s
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)38.18sResponse Time (max)94.06sResponse Time (total)114.53s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.88sResponse Time (max)2.61sResponse Time (total)3.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.71sResponse Time (max)7.71sResponse Time (total)7.71s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.81sResponse Time (max)4.81sResponse Time (total)4.81s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.26sResponse Time (max)6.38sResponse Time (total)13.06s
Coding
: 7.0 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)81.67sResponse Time (max)130.77sResponse Time (total)163.34s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)53.36sResponse Time (max)53.36sResponse Time (total)53.36s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.81sResponse Time (max)20.29sResponse Time (total)37.61s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Extra formatting: 2Response Time (avg)37.87sResponse Time (max)84.22sResponse Time (total)113.60s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.77sResponse Time (max)3.21sResponse Time (total)5.54s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.87sResponse Time (max)16.87sResponse Time (total)16.87s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)12.46sResponse Time (max)12.46sResponse Time (total)12.46s
Anti-AI Tricks
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)597msResponse Time (max)866msResponse Time (total)2.39s
Coding
: 5.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.14sResponse Time (max)1.14sResponse Time (total)1.14s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)6.48sResponse Time (max)6.48sResponse Time (total)6.48s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)601msResponse Time (max)634msResponse Time (total)1.20s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)611msResponse Time (max)616msResponse Time (total)1.83s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)649msResponse Time (max)952msResponse Time (total)1.30s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)586msResponse Time (max)813msResponse Time (total)1.76s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.79sResponse Time (max)4.79sResponse Time (total)4.79s
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 3Response Time (avg)13.82sResponse Time (max)238.89sResponse Time (total)276.39s…
Anti-AI Tricks
: 4.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.39sResponse Time (max)2.96sResponse Time (total)5.56s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)122.77sResponse Time (max)238.89sResponse Time (total)245.54s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.38sResponse Time (max)3.38sResponse Time (total)3.38s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.32sResponse Time (max)1.39sResponse Time (total)2.64s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.48sResponse Time (max)1.85sResponse Time (total)4.45s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.64sResponse Time (max)1.80sResponse Time (total)3.28s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.46sResponse Time (max)4.46sResponse Time (total)4.46s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.36sResponse Time (max)1.36sResponse Time (total)1.36s
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)25.50sResponse Time (max)37.73sResponse Time (total)51.00s
Coding
: 5.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)47.80sResponse Time (max)54.86sResponse Time (total)95.59s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.96sResponse Time (max)65.96sResponse Time (total)65.96s
Data parsing and extraction
: 3.7 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)21.42sResponse Time (max)21.42sResponse Time (total)21.42s
Domain specific
: 5.2 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)204.02sResponse Time (max)204.02sResponse Time (total)204.02s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.64sResponse Time (max)15.64sResponse Time (total)15.64s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)33.30sResponse Time (max)33.30sResponse Time (total)33.30s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)20.13sResponse Time (max)20.13sResponse Time (total)20.13s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)59.11sResponse Time (max)168.31sResponse Time (total)236.44s
Coding
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)54.23sResponse Time (max)62.72sResponse Time (total)108.47s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.78sResponse Time (max)17.78sResponse Time (total)17.78s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)56.99sResponse Time (max)80.14sResponse Time (total)113.98s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)146.50sResponse Time (max)234.29sResponse Time (total)439.49s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)63.49sResponse Time (max)111.61sResponse Time (total)126.98s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Timed out: 1Response Time (avg)27.61sResponse Time (max)31.84sResponse Time (total)55.21s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.33sResponse Time (max)10.33sResponse Time (total)10.33s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)48.98sResponse Time (max)48.98sResponse Time (total)48.98s
A test is fully passed only if every run passed for that test.Wrong answer: 9Response Time (avg)3.31sResponse Time (max)20.51sResponse Time (total)66.17s…
Anti-AI Tricks
: 5.2 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.63sResponse Time (max)5.57sResponse Time (total)10.53s
Coding
: 4.2 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)3.06sResponse Time (max)3.45sResponse Time (total)6.12s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)20.51sResponse Time (max)20.51sResponse Time (total)20.51s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.87sResponse Time (max)3.54sResponse Time (total)5.74s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.22sResponse Time (max)1.25sResponse Time (total)3.67s
General Intelligence
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.62sResponse Time (max)1.62sResponse Time (total)1.62s
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.40sResponse Time (max)1.46sResponse Time (total)2.79s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.65sResponse Time (max)3.59sResponse Time (total)7.94s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.27sResponse Time (max)5.27sResponse Time (total)5.27s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.97sResponse Time (max)1.97sResponse Time (total)1.97s
Anti-AI Tricks
: 8.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.81sResponse Time (max)5.65sResponse Time (total)7.62s
Coding
: 2.3 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)23.58sResponse Time (max)23.58sResponse Time (total)23.58s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)37.64sResponse Time (max)37.64sResponse Time (total)37.64s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.63sResponse Time (max)6.63sResponse Time (total)6.63s
Domain specific
: 5.8 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)121.79sResponse Time (max)121.79sResponse Time (total)121.79s
Tool Calling
: 2.8 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)27.71sResponse Time (max)27.71sResponse Time (total)27.71s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.52sResponse Time (max)25.52sResponse Time (total)25.52s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)3.18sResponse Time (max)10.87sResponse Time (total)63.55s…
Anti-AI Tricks
: 9.1 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.39sResponse Time (max)3.58sResponse Time (total)9.57s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.59sResponse Time (max)3.93sResponse Time (total)7.19s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.87sResponse Time (max)10.87sResponse Time (total)10.87s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.60sResponse Time (max)2.69sResponse Time (total)5.19s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.16sResponse Time (max)3.89sResponse Time (total)9.49s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.60sResponse Time (max)2.60sResponse Time (total)2.60s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.59sResponse Time (max)3.04sResponse Time (total)5.17s
Puzzle Solving
: 7.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.95sResponse Time (max)2.48sResponse Time (total)5.84s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.55sResponse Time (max)4.55sResponse Time (total)4.55s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.08sResponse Time (max)3.08sResponse Time (total)3.08s
A test is fully passed only if every run passed for that test.Wrong answer: 6Did not follow instructions: 1Response Time (avg)3.94sResponse Time (max)14.93sResponse Time (total)78.74s…
Anti-AI Tricks
: 9.1 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.33sResponse Time (max)3.89sResponse Time (total)9.30s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.98sResponse Time (max)4.34sResponse Time (total)7.95s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)14.93sResponse Time (max)14.93sResponse Time (total)14.93s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.29sResponse Time (max)2.31sResponse Time (total)4.59s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)4.21sResponse Time (max)5.86sResponse Time (total)12.62s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.16sResponse Time (max)3.16sResponse Time (total)3.16s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.91sResponse Time (max)1.93sResponse Time (total)3.82s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.30sResponse Time (max)9.55sResponse Time (total)15.89s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.80sResponse Time (max)3.80sResponse Time (total)3.80s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.68sResponse Time (max)2.68sResponse Time (total)2.68s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)40.57sResponse Time (max)110.43sResponse Time (total)121.72s
Coding
: 3.5 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)62.83sResponse Time (max)62.83sResponse Time (total)62.83s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)29.57sResponse Time (max)29.57sResponse Time (total)29.57s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.01sResponse Time (max)15.01sResponse Time (total)15.01s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)170.45sResponse Time (max)170.45sResponse Time (total)170.45s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)11.91sResponse Time (max)11.91sResponse Time (total)11.91s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)108.45sResponse Time (max)108.45sResponse Time (total)108.45s
Anti-AI Tricks
: 6.4 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)16.53sResponse Time (max)39.91sResponse Time (total)66.11s
Coding
: 2.7 A test is fully passed only if every run passed for that test.API error: 1Timed out: 1Response Time (avg)51.77sResponse Time (max)51.77sResponse Time (total)51.77s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)65.02sResponse Time (max)65.02sResponse Time (total)65.02s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)23.62sResponse Time (max)36.44sResponse Time (total)47.24s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)41.16sResponse Time (max)43.56sResponse Time (total)82.32s
Puzzle Solving
: 5.9 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)34.84sResponse Time (max)76.46sResponse Time (total)104.52s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)21.33sResponse Time (max)21.33sResponse Time (total)21.33s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)39.14sResponse Time (max)39.14sResponse Time (total)39.14s
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)501msResponse Time (max)839msResponse Time (total)2.01s
Coding
: 3.4 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.22sResponse Time (max)1.22sResponse Time (total)1.22s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)6.04sResponse Time (max)6.04sResponse Time (total)6.04s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)522msResponse Time (max)537msResponse Time (total)1.04s
General Intelligence
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)659msResponse Time (max)659msResponse Time (total)659ms
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)445msResponse Time (max)505msResponse Time (total)889ms
Puzzle Solving
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)473msResponse Time (max)502msResponse Time (total)1.42s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.63sResponse Time (max)4.63sResponse Time (total)4.63s
Anti-AI Tricks
: 5.6 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)2.67sResponse Time (max)5.03sResponse Time (total)10.66s
Coding
: 5.1 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)44.82sResponse Time (max)59.15sResponse Time (total)89.64s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)25.25sResponse Time (max)25.25sResponse Time (total)25.25s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)1.23sResponse Time (max)1.96sResponse Time (total)2.46s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.API error: 1Wrong answer: 1Response Time (avg)6.11sResponse Time (max)13.72sResponse Time (total)18.34s
Instructions following
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.38sResponse Time (max)1.61sResponse Time (total)2.75s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.50sResponse Time (max)3.50sResponse Time (total)3.50s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)5.92sResponse Time (max)5.92sResponse Time (total)5.92s
A test is fully passed only if every run passed for that test.Wrong answer: 12Invalid tool call: 1Response Time (avg)4.20sResponse Time (max)32.57sResponse Time (total)83.95s…
Anti-AI Tricks
: 4.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.11sResponse Time (max)3.94sResponse Time (total)8.46s
Coding
: 4.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)6.33sResponse Time (max)9.79sResponse Time (total)12.65s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)32.57sResponse Time (max)32.57sResponse Time (total)32.57s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.08sResponse Time (max)1.62sResponse Time (total)2.15s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)1.99sResponse Time (max)3.99sResponse Time (total)5.98s
General Intelligence
: 5.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)790msResponse Time (max)790msResponse Time (total)790ms
Instructions following
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.98sResponse Time (max)2.28sResponse Time (total)3.97s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.45sResponse Time (max)2.09sResponse Time (total)4.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)10.68sResponse Time (max)10.68sResponse Time (total)10.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.34sResponse Time (max)2.34sResponse Time (total)2.34s
A test is fully passed only if every run passed for that test.Wrong answer: 8Did not follow instructions: 3Response Time (avg)2.27sResponse Time (max)14.63sResponse Time (total)43.20s…
Coding
: 7.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.29sResponse Time (max)3.06sResponse Time (total)4.58s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.28sResponse Time (max)3.28sResponse Time (total)3.28s
Data parsing and extraction
: 7.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.11sResponse Time (max)1.47sResponse Time (total)2.21s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)6.48sResponse Time (max)14.63sResponse Time (total)19.43s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.07sResponse Time (max)1.07sResponse Time (total)1.07s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.89sResponse Time (max)1.89sResponse Time (total)1.89s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.58sResponse Time (max)2.58sResponse Time (total)2.58s
Coding
: 3.4 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)55.33sResponse Time (max)89.40sResponse Time (total)110.66s
Combined
: 2.8 A test is fully passed only if every run passed for that test.Invalid tool call: 1Response Time (avg)65.57sResponse Time (max)65.57sResponse Time (total)65.57s
Data parsing and extraction
: 6.3 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)1.51sResponse Time (max)1.51sResponse Time (total)1.51s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 2No answer: 1Response Time (avg)174.55sResponse Time (max)174.55sResponse Time (total)174.55s
General Intelligence
: 3.6 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)18.14sResponse Time (max)18.14sResponse Time (total)18.14s
Instructions following
: 6.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.97sResponse Time (max)2.97sResponse Time (total)2.97s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)15.95sResponse Time (max)15.95sResponse Time (total)15.95s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)11.13sResponse Time (max)11.13sResponse Time (total)11.13s
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.14sResponse Time (max)12.41sResponse Time (total)16.57s
Coding
: 6.9 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)64.48sResponse Time (max)97.49sResponse Time (total)128.97s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.86sResponse Time (max)16.86sResponse Time (total)16.86s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Extra formatting: 1Wrong answer: 1Response Time (avg)34.53sResponse Time (max)86.93sResponse Time (total)103.59s
Instructions following
: 9.9 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.80sResponse Time (max)1.81sResponse Time (total)3.60s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.No answer: 1Response Time (avg)20.25sResponse Time (max)57.93sResponse Time (total)60.76s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)7.29sResponse Time (max)7.29sResponse Time (total)7.29s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)51.29sResponse Time (max)51.29sResponse Time (total)51.29s
A test is fully passed only if every run passed for that test.Wrong answer: 6Response Time (avg)1.30sResponse Time (max)3.92sResponse Time (total)25.95s…
Anti-AI Tricks
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.08sResponse Time (max)1.39sResponse Time (total)4.30s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.39sResponse Time (max)1.63sResponse Time (total)2.78s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.17sResponse Time (max)2.17sResponse Time (total)2.17s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.35sResponse Time (max)1.43sResponse Time (total)2.69s
Domain specific
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)975msResponse Time (max)1.08sResponse Time (total)2.92s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.04sResponse Time (max)1.04sResponse Time (total)1.04s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)943msResponse Time (max)974msResponse Time (total)1.89s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.13sResponse Time (max)1.29sResponse Time (total)3.40s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.92sResponse Time (max)3.92sResponse Time (total)3.92s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)856msResponse Time (max)856msResponse Time (total)856ms
A test is fully passed only if every run passed for that test.Wrong answer: 10Did not follow instructions: 2Response Time (avg)3.04sResponse Time (max)6.51sResponse Time (total)60.88s…
Anti-AI Tricks
: 4.8 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)3.13sResponse Time (max)5.90sResponse Time (total)12.50s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.77sResponse Time (max)5.30sResponse Time (total)7.54s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.51sResponse Time (max)6.51sResponse Time (total)6.51s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)3.81sResponse Time (max)5.69sResponse Time (total)7.62s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.09sResponse Time (max)2.39sResponse Time (total)6.26s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.97sResponse Time (max)2.43sResponse Time (total)3.93s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.86sResponse Time (max)4.86sResponse Time (total)4.86s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.23sResponse Time (max)2.23sResponse Time (total)2.23s
A test is fully passed only if every run passed for that test.API error: 6Wrong answer: 3Response Time (avg)56.57sResponse Time (max)149.94sResponse Time (total)848.59s…
Anti-AI Tricks
: 6.4 A test is fully passed only if every run passed for that test.API error: 2Response Time (avg)15.12sResponse Time (max)19.99sResponse Time (total)45.37s
Coding
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)99.76sResponse Time (max)99.76sResponse Time (total)99.76s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)113.09sResponse Time (max)113.09sResponse Time (total)113.09s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)12.11sResponse Time (max)12.11sResponse Time (total)12.11s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)109.04sResponse Time (max)149.94sResponse Time (total)327.11s
General Intelligence
: 3.0 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)34.36sResponse Time (max)41.83sResponse Time (total)68.73s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)27.94sResponse Time (max)45.06sResponse Time (total)55.89s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)78.83sResponse Time (max)78.83sResponse Time (total)78.83s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)47.71sResponse Time (max)47.71sResponse Time (total)47.71s
A test is fully passed only if every run passed for that test.Wrong answer: 11Did not follow instructions: 2Response Time (avg)2.27sResponse Time (max)6.58sResponse Time (total)45.50s…
Anti-AI Tricks
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)1.80sResponse Time (max)2.62sResponse Time (total)7.19s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.65sResponse Time (max)3.82sResponse Time (total)5.30s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)6.58sResponse Time (max)6.58sResponse Time (total)6.58s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.39sResponse Time (max)1.42sResponse Time (total)2.78s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.78sResponse Time (max)2.49sResponse Time (total)5.34s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.51sResponse Time (max)2.95sResponse Time (total)5.02s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.39sResponse Time (max)4.39sResponse Time (total)4.39s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.63sResponse Time (max)1.63sResponse Time (total)1.63s
Anti-AI Tricks
: 6.6 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)74.75sResponse Time (max)182.10sResponse Time (total)298.98s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)220.48sResponse Time (max)243.66sResponse Time (total)440.97s
Combined
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)262.83sResponse Time (max)262.83sResponse Time (total)262.83s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)24.27sResponse Time (max)27.52sResponse Time (total)48.54s
Domain specific
: 3.0 A test is fully passed only if every run passed for that test.Timed out: 3Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.47sResponse Time (max)19.46sResponse Time (total)34.93s
Puzzle Solving
: 8.2 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)31.79sResponse Time (max)50.78sResponse Time (total)95.38s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)88.68sResponse Time (max)88.68sResponse Time (total)88.68s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)56.76sResponse Time (max)56.76sResponse Time (total)56.76s
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 2Response Time (avg)2.86sResponse Time (max)8.21sResponse Time (total)57.24s…
Anti-AI Tricks
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)2.84sResponse Time (max)4.15sResponse Time (total)11.35s
Coding
: 4.4 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)2.58sResponse Time (max)3.93sResponse Time (total)5.16s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)4.89sResponse Time (max)4.89sResponse Time (total)4.89s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)2.47sResponse Time (max)2.48sResponse Time (total)4.95s
Domain specific
: 5.3 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)1.97sResponse Time (max)2.65sResponse Time (total)5.92s
Instructions following
: 6.5 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.13sResponse Time (max)2.53sResponse Time (total)4.27s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)8.21sResponse Time (max)8.21sResponse Time (total)8.21s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.37sResponse Time (max)2.37sResponse Time (total)2.37s
Anti-AI Tricks
: 8.1 A test is fully passed only if every run passed for that test.Extra formatting: 1Response Time (avg)15.85sResponse Time (max)20.83sResponse Time (total)47.55s
Coding
: 4.1 A test is fully passed only if every run passed for that test.Timed out: 1Wrong answer: 1Response Time (avg)7.20sResponse Time (max)13.03sResponse Time (total)14.41s
Combined
: 9.8 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)75.68sResponse Time (max)75.68sResponse Time (total)75.68s
Data parsing and extraction
: 6.5 A test is fully passed only if every run passed for that test.API error: 1Response Time (avg)0msResponse Time (max)0msResponse Time (total)0ms
Domain specific
: 5.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Response Time (avg)96.01sResponse Time (max)96.01sResponse Time (total)96.01s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)4.28sResponse Time (max)7.37sResponse Time (total)8.55s
Puzzle Solving
: 7.7 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)3.87sResponse Time (max)5.26sResponse Time (total)7.74s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)27.78sResponse Time (max)27.78sResponse Time (total)27.78s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.96sResponse Time (max)1.96sResponse Time (total)1.96s
A test is fully passed only if every run passed for that test.Wrong answer: 3Timed out: 2No answer: 1Response Time (avg)50.92sResponse Time (max)369.32sResponse Time (total)967.47s…
Anti-AI Tricks
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)6.20sResponse Time (max)9.64sResponse Time (total)24.78s
Coding
: 2.9 A test is fully passed only if every run passed for that test.No answer: 1Timed out: 1Response Time (avg)258.40sResponse Time (max)369.32sResponse Time (total)516.79s
Combined
: 9.6 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)73.55sResponse Time (max)73.55sResponse Time (total)73.55s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)16.51sResponse Time (max)20.57sResponse Time (total)33.02s
Domain specific
: 2.9 A test is fully passed only if every run passed for that test.Wrong answer: 2Timed out: 1Response Time (avg)23.62sResponse Time (max)27.00sResponse Time (total)47.23s
General Intelligence
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)29.76sResponse Time (max)29.76sResponse Time (total)29.76s
Instructions following
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)17.54sResponse Time (max)21.25sResponse Time (total)35.08s
Puzzle Solving
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)5.79sResponse Time (max)6.85sResponse Time (total)17.36s
Tool Calling
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)9.01sResponse Time (max)9.01sResponse Time (total)9.01s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)180.87sResponse Time (max)180.87sResponse Time (total)180.87s
A test is fully passed only if every run passed for that test.Wrong answer: 12Did not follow instructions: 3Response Time (avg)1.15sResponse Time (max)2.52sResponse Time (total)23.09s…
Anti-AI Tricks
: 3.1 A test is fully passed only if every run passed for that test.Wrong answer: 4Response Time (avg)929msResponse Time (max)1.55sResponse Time (total)3.72s
Coding
: 6.8 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.01sResponse Time (max)1.19sResponse Time (total)2.02s
Combined
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)2.52sResponse Time (max)2.52sResponse Time (total)2.52s
Data parsing and extraction
: 10.0 A test is fully passed only if every run passed for that test.No failed answers.Response Time (avg)1.30sResponse Time (max)1.58sResponse Time (total)2.61s
Domain specific
: 3.5 A test is fully passed only if every run passed for that test.Wrong answer: 3Response Time (avg)937msResponse Time (max)1.25sResponse Time (total)2.81s
Instructions following
: 6.3 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)728msResponse Time (max)731msResponse Time (total)1.46s
Tool Calling
: 3.0 A test is fully passed only if every run passed for that test.Did not follow instructions: 1Response Time (avg)2.32sResponse Time (max)2.32sResponse Time (total)2.32s
Trivia
: 3.0 A test is fully passed only if every run passed for that test.Wrong answer: 1Response Time (avg)1.33sResponse Time (max)1.33sResponse Time (total)1.33s