#83

Grok 4.3

X AI Release: 2026-05-01 Tested on: 2026-07-16 23:07 x-ai/grok-4.3::medium

Summary

Grok 4.3 scores 7.1 on AI BENCHY and ranks #83. It has 10.0 reliability, a 68.2% pass rate, $0.779 total cost, and 47.45s average response time.

Score

7.1

Consistency

8.6

Reliability

10.0

Total Cost (Current Price)

$0.779

Total Output Tokens

241,421

Total Input Tokens

140,031

Input Price

$1.250 / 1M

Output Price

$2.500 / 1M

Tests Correct

Wrong Tests: 9

Attempt pass rate: 68.2%

Flaky tests

Flaky tests had mixed outcomes across runs (at least one pass and one fail).

Response Time (avg)

47.45s

Response Time (max): 216.69s

Response Time (total): 1043.83s

Wrong answer: 5 Did not follow instructions: 2 Extra formatting: 1 No answer: 1

Hamster playing table tennis

Prompt: Create a detailed SVG illustration of a hamster playing table tennis.

#83 xAI: Grok 4.3

medium

Cost: $0.009
Time: 19.0s
Tokens: 3,661 tok

Run history

Tested on	Score	Reliability	Total Cost	Compare
2026-07-16 23:07 New test added	7.1	10.0	$0.779	Current run
2026-06-04 14:11 New test added	7.6	10.0	$0.614	Compare
2026-05-22 00:32 Re-test	7.8	10.0	$0.593	Compare
2026-05-01 00:40 Initial run	8.2	10.0	$0.517	Compare

This run used a different benchmark suite. Keep suite changes in mind when reading historical movement.

Price History

Historical pricing data for this model from OpenRouter.

Date	Input Price	Output Price
2026-06-04 15:40	$1.250 / 1M	$2.500 / 1M

Charts

Choose the first model, then click a second model to open a side-by-side page.

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Quick Compare

Grok 4.3mediumvsQwen3.5-122B-A10Bmedium Grok 4.3mediumvsGLM 5.1medium Grok 4.3mediumvsQwen3.7 Plusnone Grok 4.3mediumvsGrok 4.20medium Grok 4.3mediumvsQwen3.5 Plus 2026-04-20medium Grok 4.3mediumvsDeepSeek V3.2medium Grok 4.3mediumvsKAT-Coder-Pro V2.5high Grok 4.3mediumvsKimi K2.5medium Grok 4.3mediumvsKimi K2.6medium Grok 4.3mediumvsMercury 2medium

Category Breakdown

Category	Score	Consistency
Anti-AI Tricks	10.0	10.0
Coding	5.9	7.7
Combined	6.5	10.0
Data parsing and extraction	10.0	10.0
Domain specific	5.3	7.2
General Intelligence	5.4	2.5
Instructions following	9.8	10.0
Puzzle Solving	5.9	7.2
Tool Calling	10.0	10.0
Trivia	3.0	10.0

Grok 4.3

Hamster playing table tennis

#83 xAI: Grok 4.3

Charts

Top Models by Score

Score vs Total Cost

Response Time (avg)

Score vs Response Time (avg)

Total Output Tokens

Score vs Total Output Tokens

Quick Compare

Category Breakdown

Compared models