xAI’s Grok 4.20 Sets Honesty Record but Trails in Intelligence


TL;DR

  • New Model: xAI launched Grok 4.20 in three API variants with pricing up to 60% cheaper than Grok 3.
  • Honesty Record: Grok 4.20 achieved a 78% non-hallucination rate on the Artificial Analysis Omniscience test, the highest of any model tested.
  • Intelligence Gap: The model ranks 8th on the Intelligence Index with a score of 48, trailing leaders Gemini 3.1 Pro and GPT-5.4 at 57.
  • Enterprise Focus: All variants support multi-agent orchestration, a 2-million-token context window, and provisioned throughput in US and EU regions.

Elon Musk’s xAI launched Grok 4.20 for developers in three API variants, pricing the new model up to 60% cheaper than its predecessor while setting a record for the lowest hallucination rate among tested AI models. As detailed on March 24, xAI’s Grok 4.20 developer page shows the model ships in reasoning, non-reasoning, and multi-agent configurations, all sharing a 2-million-token context window and identical tool support.

Furthermore, Grok 4.20 set a record non-hallucination rate of 78% on the Artificial Analysis Omniscience test while ranking just 8th on the same organization’s Intelligence Index with a score of 48. According to Artificial Analysis, that gap signals xAI is optimizing for reliability over raw benchmark dominance.

Grok 4.20 Offers Three Variants at Lower Prices

All three Grok 4.20 variants share identical token pricing: $20 per million input tokens and $60 per million output tokens. Compared to Grok 3, which remains available at $30 and $150 respectively, that represents a 33% reduction on input and 60% reduction on output.

Beyond the standard tier, long-context requests above 200,000 tokens are priced at $40 per million input and $120 per million output. xAI also offers budget alternatives through grok-4-fast and grok-4-1-fast at $2 per million input tokens and $5 per million output, giving developers a 10x cheaper option for less demanding workloads.

Under the simple alias “grok-4.20,” the reasoning variant serves as the default model call. Its non-reasoning counterpart strips out chain-of-thought processing for faster responses, while a dedicated multi-agent variant supports orchestration of up to four parallel agents in its Heavy consumer mode.