
When Elon Musk's AI startup xAI launched Grok 4, it claimed the title of the world’s smartest AI model—and it delivered.
Grok 4 is a next-generation large language model (LLM) released in July 2025 by xAI. It immediately topped every major benchmark, showcasing advanced reasoning, academic mastery, coding intelligence, and real‑time knowledge access. Media widely hailed it as “the smartest AI in the world.” (Axios)
This is a 2,500-question academic test across over 100 subjects—from physics to philosophy. Grok 4 scored 25.4% without tools, rising to 38.6% with tool use, and hitting 44.4% in its multi-agent version Grok 4 Heavy—far outpacing competitors like Gemini and GPT‑4. (Scientific American)
Evaluated by Artificial Analysis, Grok 4 achieved an Intelligence Index of 73, ahead of OpenAI’s o3 and Google’s Gemini 2.5 Pro (both at 70) and Claude Opus 4 (64). (Medium)
On the ARC‑AGI-2 benchmark, Grok 4 scored 16.2%, nearly double the score of Claude Opus 4. This test gauges general abstract reasoning through visual puzzles. Grok 4’s performance is a profound leap in AI problem-solving. (Facebook)
Grok 4’s specialized coding variant scored 75% on SWE-Bench, outperforming all peer models and integrating smoothly into development environments.
In real‑world business simulations, Grok 4 more than doubled human and AI baseline performance—managing inventory, pricing, and suppliers over multiple rounds, generating significantly higher revenue. (DataCamp)
Grok 4 Heavy spins up multiple AI agents for each task, letting them collaborate and converge on stronger answers. This approach yields better outcomes on complex reasoning tests like HLE and ARC‑AGI. However, it costs ten times more to operate and comes with slower performance. (DataCamp)
Unlike most other AI models with fixed training data, Grok 4 can access live information from X (formerly Twitter), giving it a real-time edge in answering current events questions. (TechRadar)
It supports a massive 256k token context window via its API, handling long documents and complex reasoning tasks. Pricing is steep: roughly $3 per million input tokens, $15 per million outputs. (Artificial Analysis)
Despite its breakthroughs, Grok 4 launched amid controversy:
Ideal for:
Not ideal for:
xAI has ambitious plans through 2025:
With Grok 4, xAI announced not just a new AI model—but a new benchmark for intelligence testing and real-world applications. It outpaces rivals in academic, reasoning, coding, and simulation environments.
But notable caveats remain: slower speed, expensive pricing, safety concerns, and public trust issues tied to earlier controversies. And while Grok 4 dominates in narrow benchmarks, it’s not yet the everyday assistant most consumers will use.
At the same time, its emergence signals a turning point: the goalposts for AI are now higher than ever—and competition is fierce.
Model | Intelligence Index | HLE (with tools) | ARC‑AGI‑2 | Vending‑Bench Earnings |
---|---|---|---|---|
Grok 4 | 73 | ~38% | 16.2% | ≈$4,600 (top performer) |
Elon Musk’s xAI just dropped a bombshell. Grok 4 isn’t just fast—it’s fundamentally smarter in structured reasoning domains. Whether it lives up to its promise in safer, everyday use remains to be seen. But one thing’s clear: the AI race has officially levelled up.