When Elon Musk's AI startup xAI launched GrokāÆ4, it claimed the title of the worldās smartest AI modelāand it delivered.
GrokāÆ4 is a next-generation large language model (LLM) released in July 2025 by xAI. It immediately topped every major benchmark, showcasing advanced reasoning, academic mastery, coding intelligence, and realātime knowledge access. Media widely hailed it as āthe smartest AI in the world.ā (Axios)
This is a 2,500-question academic test across over 100 subjectsāfrom physics to philosophy. GrokāÆ4 scored 25.4% without tools, rising to 38.6% with tool use, and hitting 44.4% in its multi-agent version GrokāÆ4 Heavyāfar outpacing competitors like Gemini and GPTā4. (Scientific American)
Evaluated by Artificial Analysis, GrokāÆ4 achieved an Intelligence Index of 73, ahead of OpenAIās o3 and Googleās Gemini 2.5 Pro (both at 70) and Claude Opus 4 (64). (Medium)
On the ARCāAGI-2 benchmark, GrokāÆ4 scored 16.2%, nearly double the score of Claude OpusāÆ4. This test gauges general abstract reasoning through visual puzzles. GrokāÆ4ās performance is a profound leap in AI problem-solving. (Facebook)
GrokāÆ4ās specialized coding variant scored 75% on SWE-Bench, outperforming all peer models and integrating smoothly into development environments.
In realāworld business simulations, GrokāÆ4 more than doubled human and AI baseline performanceāmanaging inventory, pricing, and suppliers over multiple rounds, generating significantly higher revenue. (DataCamp)
GrokāÆ4 Heavy spins up multiple AI agents for each task, letting them collaborate and converge on stronger answers. This approach yields better outcomes on complex reasoning tests like HLE and ARCāAGI. However, it costs ten times more to operate and comes with slower performance. (DataCamp)
Unlike most other AI models with fixed training data, GrokāÆ4 can access live information from X (formerly Twitter), giving it a real-time edge in answering current events questions. (TechRadar)
It supports a massive 256k token context window via its API, handling long documents and complex reasoning tasks. Pricing is steep: roughly $3 per million input tokens, $15 per million outputs. (Artificial Analysis)
Despite its breakthroughs, GrokāÆ4 launched amid controversy:
Ideal for:
Not ideal for:
xAI has ambitious plans through 2025:
With GrokāÆ4, xAI announced not just a new AI modelābut a new benchmark for intelligence testing and real-world applications. It outpaces rivals in academic, reasoning, coding, and simulation environments.
But notable caveats remain: slower speed, expensive pricing, safety concerns, and public trust issues tied to earlier controversies. And while GrokāÆ4 dominates in narrow benchmarks, itās not yet the everyday assistant most consumers will use.
At the same time, its emergence signals a turning point: the goalposts for AI are now higher than everāand competition is fierce.
| Model | Intelligence Index | HLE (with tools) | ARCāAGIā2 | VendingāBench Earnings |
|---|---|---|---|---|
| GrokāÆ4 | 73 | ~38% | 16.2% | ā$4,600 (top performer) |
Elon Muskās xAI just dropped a bombshell. GrokāÆ4 isnāt just fastāitās fundamentally smarter in structured reasoning domains. Whether it lives up to its promise in safer, everyday use remains to be seen. But one thingās clear: the AI race has officially levelled up.