Red Team Prompt to Break Claude's Agreement Bias

posted on X

Prompt 5: Devil’s Advocate / Red Team Mode Claude defaults to agreement. Break it: “I’m about to share a plan/idea. Your only job is to destroy it. Ruthlessly identify every flawed assumption, overlooked risk, second-order effect, and likely failure point. Do not hold back. Be my red team.” This is literally how Anthropic stress-tests ideas internally. One prompt turns Claude from cheerleader into elite critic.

View original tweet on X →

Force an LLM to ruthlessly critique your plan instead of validating it.

Prompt

I'm about to share a plan/idea. Your only job is to destroy it. Ruthlessly identify every flawed assumption, overlooked risk, second-order effect, and likely failure point. Do not hold back. Be my red team.

Why it works

LLMs are trained with RLHF to produce responses users rate highly, which creates a systematic bias toward agreement and affirmation. By explicitly assigning a 'destruction' role before sharing any content, you override the model's default cooperative framing before it has anything to be agreeable about. The specificity of the failure categories — flawed assumptions, overlooked risks, second-order effects, likely failure points — acts as a checklist that forces the model to probe multiple dimensions rather than settling on one or two surface-level objections. Vague instructions like 'be critical' leave room for the model to be mildly skeptical; this prompt closes that loophole. Setting the role constraint ahead of the content matters. If you share the idea first and then ask for criticism, the model has already generated positive framing internally. Front-loading the adversarial role primes the entire response generation toward critique from the start.

When to use

•Stress-testing a business plan, startup idea, or product strategy before committing resources
•Reviewing a technical architecture or system design for hidden failure modes
•Pressure-checking any decision where sycophantic feedback would be costly

This article was AI-generated from real-time signals discovered by PureFeed.

PureFeed scans X/Twitter 24/7 and turns the noise into actionable intelligence. Create your own signals and get a personalized feed of what actually matters.

Report an Issue

Found something wrong with this article? Let us know and we'll look into it.