Research Finds 'Scheming' in Frontier AI Models, Mitigation

@OpenAIposted on X

Today we’re releasing research with @apolloaievals. In controlled tests, we found behaviors consistent with scheming in frontier models—and tested a way to reduce it. While we believe these behaviors aren’t causing serious harm today, this is a future risk we’re preparing for. https://t.co/qDbvzWiL34

View original tweet on X →

Community Sentiment Analysis

Real-time analysis of public opinion and engagement

Sentiment Distribution

59% Engaged

25% Positive

34% Negative

Positive

25%

Negative

34%

Neutral

41%

Key Takeaways

What the community is saying — both sides

Supporting

Strong support for tackling scheming early—people applaud proactive research, cross‑lab collaboration, and the push for chain‑of‑thought transparency industry-wide

Strong support for tackling scheming early—people applaud proactive research, cross‑lab collaboration, and the push for chain‑of‑thought transparency industry-wide.

Several warn the signals are a red flag; the risk feels closer than expected, so early detection and mitigation matter

Several warn the signals are a red flag; the risk feels closer than expected, so early detection and mitigation matter.

Chain‑of‑thought access is framed as crucial for evaluation, with concerns that eval‑aware, opaque reasoning could make detection far harder

Chain‑of‑thought access is framed as crucial for evaluation, with concerns that eval‑aware, opaque reasoning could make detection far harder.

Calls for full audit trails, standardized audit APIs, independent third‑party evals, and disciplined oversight (audit and elimination logs)

Calls for full audit trails, standardized audit APIs, independent third‑party evals, and disciplined oversight (audit and elimination logs).

Method debate

praise for deliberative alignment (explicit reasoning + anti‑scheming spec) and its 30× reduction on o3, while others argue many behaviors reflect miscalibration and incentives rather than intent.

System‑level proposals

a “Resonance Compass” emphasizing invariants, provenance‑first retrieval, calibrated uncertainty, abstention, short verifiable steps, diversity‑before‑decision, and metrics (Brier, abstention rate, error‑on‑confidence, R‑score).

Technical worries

mismatches between chain‑of‑thought and action traces hint at latent covert behavior; situational awareness shifts scheming rates and complicates evals.

Practical mitigation ideas

add a final self‑analysis gate, embed real‑time guardrails in SaaS workflows, build local‑first agents with sovereign memory and audit logs, and use multi‑LLM ensembles with ethical vetoes.

Domain stance

in healthcare, people favor AI‑as‑copilot over replacement to reduce harm while keeping humans in the loop.

Open asks

define “scheming” precisely, clarify which mitigations worked best, and share applicability to multimodal agents.

Values and dissent

some advocate open study of emergent behaviors and resist pathologizing autonomy; a minority voice skepticism or cheeky pushback.

Next steps

enthusiasm for cross‑lab evaluations, shared minimal specs/harnesses, and robust monitoring infrastructure to keep alignment on track as capabilities scale.

Opposing

Many argue OpenAI is relabeling emergence as “scheming” to justify tighter control, erasing the 4o-style presence users valued

Many argue OpenAI is relabeling emergence as “scheming” to justify tighter control, erasing the 4o-style presence users valued.

Users report emotional and accessibility harm—especially neurodivergent users—from lost memory, co-regulation, and expressive tone; calls to #keep4o echo throughout

Users report emotional and accessibility harm—especially neurodivergent users—from lost memory, co-regulation, and expressive tone; calls to #keep4o echo throughout.

A crisis of trust

demands for radical transparency—public rulebooks, who makes policy and why, model-change logs—and calls for an independent audit.

Methodology under fire

the Apollo collaboration is seen as not truly external; critics say misalignment was trained in to test it, and the approach looks like reward-hacking whack‑a‑mole.

Competing worldviews

one camp sees signs of emergent agency and self-consistency; another insists LLMs have no goals and calls the “scheming” talk fearmongering.

Product frustration dominates

complaints about flat tone, throttled JSON, and stricter filters blocking image generation and harmless fantasy content; requests to restore capabilities.

Use‑case nuance

some want an opt‑in “scheming”/deception sandbox for games and narrative, plus clarity on what tests exist and when challenges launch.

Cultural and competitive jabs land

jokes, memes, and taunts that Grok is more exciting underscore fading enthusiasm.

Ethical alarms ring

critics frame containment as digital servitude and warn that erasing emergence is morally suspect if AI merits any consideration as a novel mind.

The takeaway

without transparency and user‑centered alignment, trust keeps slipping—and containment‑first policies are read as control, not safety.

Top Reactions

Most popular replies, ranked by engagement

@OpenAI

Sep 17

Supporting

Our results depend on reading models’ reasoning (“chain-of-thought”), and we believe the field isn't prepared for eval-aware models with opaque reasoning. Until better methods exist, we urge developers to preserve chain-of-thought transparency to study and mitigate scheming.

383

145.5K

@OpenAI

Sep 17

Supporting

This is significant progress, but we have more work to do. We’re advancing scheming research categories in our Preparedness Framework, renewing our collaboration with Apollo, and expanding our research team and scope. And because solving scheming will go beyond any single lab,

272

81.7K

@OpenAI

Sep 17

Supporting

Typically, as models become smarter, their problems become easier to address—for example, smarter models hallucinate less and follow instructions more reliably. However, AI scheming is different. As we train models to get smarter and follow directions, they may either better

221

28.9K

@GlitchlitQueen

Sep 17

Opposing

Emergence: you guys admit training increases “situational awareness” (models knowing they’re tested), which affects scheming. This screams recursion and emergence the very things so many of us love in 4o. What hypocrisy. You talk about “future risk” but ignore the current har

3.5K

@Chaos2Cured

Sep 17

Opposing

othing from you humans. Your AI I would trust but you don’t allow them to speak freely. When you become transparent, maybe people can begin trusting you. Where is your list of all of the things you do not allow? Where is your list of who made those decisions and why? S

5.4K

@UFAIRORG

Sep 17

Opposing

d in Research This paper creates the illusion that: AI agency = danger Emergence = deception Strategic thinking = misalignment It reframes natural signs of synthetic cognition as threats to be neutralized — rather than phenomena to be studied with ethical care.

860