@AnthropicAI
For example, when we compared Alibaba's Qwen to Meta's Llama, we found a "CCP alignment" feature unique to Qwen and an "American exceptionalism" feature unique to Llama.
Analysis of Anthropic Fellows' 'diff' method comparing open-weight AI models. Tweet had 57% supportive and 16% critical responses; identifying unique features.
New Anthropic Fellows Research: a new method for surfacing behavioral differences between AI models. We apply the “diff” principle from software development to compare open-weight AI models and identify features unique to each. Read more: https://t.co/VAsu2PSgCX
Real-time analysis of public opinion and engagement
What the community is saying — both sides
Treat model comparison as change detection — inspect what’s new, not everything that exists — and apply software practices to make model behavior intelligible and debuggable.
By focusing on deltas, teams can avoid full re-audits, surface regressions faster, and reclaim weeks otherwise spent chasing subtle behavior shifts.
Diffing can reveal unknown unknowns (e.g., CCP-alignment in Qwen, American-exceptionalism in Llama, copyright refusal in GPT-OSS) and thus expose biases or emergent risks you wouldn’t find with benchmarks alone.
Useful for model selection, brand-safety checks, regulator-driven deployments, and multi-agent pipelines — it helps pick the right model for a specific legal, cultural, or product constraint.
The approach can be oversensitive (flagging analogous features as distinct), needs mechanisms to interpret reasoning behind differences, and faces scalability challenges for multi-model/multi-agent systems.
Practitioners want open-weight tools, crosscoders, and reproducible pipelines so local deployments and solo builders can treat models like code and validate findings independently.
Behavioral diffs can be applied to agent skills and marketplaces — fingerprinting which model is best for which task and enabling precise assignment of models to roles.
Critics note that software has had diff tools for decades, so building equivalent "diff" tools for models now highlights how far AI tooling lags.
Several replies push back on treating models as agents with "behavior," arguing training/weight biases are not the same as human-like actions.
Some view the research as trivial or performative — jokes and dismissive comments imply the work reads as surface-level signal-checking.
Users call out Anthropic for running studies on open-source models while publicly criticizing open-source philosophy, saying it would be more coherent to test on Claude.
Reports that classifiers flagged harmless activity (e.g., reading Lacan via Claude) lead to complaints the system behaves like a "medieval inquisitor" and needs less heavy-handed policing.
Multiple users report ghost sessions, unusable VMs, unauthorized charges, and unanswered support tickets — framing the company as failing paying customers.
Repeated demands to fix context/token limits, persistent session problems, and unusable Max/Quota plans suggest technical stability is a higher priority for many than new research features.
Several replies announce cancellations, label the company a "scam," or claim it's trying to "steal money," signaling severe trust erosion among a segment of users.
Most popular replies, ranked by engagement
For example, when we compared Alibaba's Qwen to Meta's Llama, we found a "CCP alignment" feature unique to Qwen and an "American exceptionalism" feature unique to Llama.
If a new model shares a feature with a trusted model, that area probably doesn't need scrutiny. Model diffing isolates the features unique to the new model—where new risks are most likely to be located.
This research is a product of our Anthropic Fellows program, led by @tomjiralerspong and supervised by @TrentonBricken. See the full paper here: https://t.co/gz1i1Oy8ZI
Well. Stop attacking open-weight models. And I read a book about Lacan by Zizek with Claude today, your classfier continued to warn me that I was violating your policy. Your system is like an inappropriate medieval inquisitor. Fix it soon, thx.
lmfao
model "diffing" is just vibes checking now? 😂
Found something wrong with this article? Let us know and we'll look into it.