@Dogetothemoon
Grok is getting better everyday
Grok 4.20 Beta tweet analysis: lowest hallucination (22%), top instruction following (83%) and agentic use (97%). Support 39.35% · Confront 28.39%. Overall
The new Grok 4.20 Beta benchmarks are wild 🥇 #1 lowest hallucinating AI (22%) 🥇 #1 at following instructions (83%) 🥈 #2 in agentic tool use (97%) Grok 4.20 ranks #1 in the lowest hallucination rate ever recorded across all AI models tested globally Most models race to sound smart. Grok 4.20 was built to never lie and still dominates on instruction following and agentic tasks This is literally a 500B model performing top-notch in the things that matter most
Real-time analysis of public opinion and engagement
What the community is saying — both sides
built to favor truth over flair, and they celebrate it as the future of reliable AI.
lower hallucinations plus strong instruction‑following translate into enterprise readiness and close the gap between chatbots and “AI workers.”
(Cursor, Codex‑style capabilities, multi‑agent workflows) and whether those benchmarked scores hold up in production.
is important, yet some frontier models still top combined intelligence and reasoning benchmarks.
, arguing that freedom of execution (not moral filters) accelerates productivity and innovation.
, case studies, and long‑running agentic behavior before changing critical systems.
tests can be "chopped", curated, or tuned to look good without proving real capability.
is framed as unacceptable: that's "one-in-five responses wrong" and a clear warning for any high-value or autonomous use.
(benchmarks reward saying "I don't know") or strict output validation, not necessarily better understanding.
they mirror training data and will keep producing confident falsehoods until reasoning is solved differently.
(Grok 4 → 4.1 → 4.
and question value when newer builds reintroduce hallucinations or break prior behavior.
or bigger models is seen as an expensive band-aid, not a fundamental fix.
poor navigation, ignoring important code pages, bad schema/syntax validation, and fragile instruction-following.
and steady behavior over time rather than trumpeting one-off benchmark wins.
Most popular replies, ranked by engagement
Grok is getting better everyday
🔥 Grok 4.20 Beta is an absolute monster! 🥇 A 500B model built to never lie – and still dominates everything! Truth first wins. xAI rocks the world! 🚀 #Grok @xai
Lowest hallucination at 22% is still 1 in 5 responses being wrong. That's not a flex, that's a warning label.
he top .5 % users of ChatGPT. Grok is very simlar but it is still overly weighed down by the data it is fed. It has very little ability to reason. But no LLM is able to reason. It will never be anything more than mirror until we solve that problem. They all easily spew very fal
While other models are racing for 'intelligence' points, Grok is out here actually making sure the answers are right. This is exactly what the industry needs!
Grok at navigation leaves something to be desired. I told it to go home and it took a very round about route that was 5 times as long. From now on I will give it more explicit directions
Found something wrong with this article? Let us know and we'll look into it.