AI
AI Analysis
Live Data

Conductor: Orchestrating AI Agents — ICLR 2026 Research

ICLR 2026: 7B Conductor trained with RL to orchestrate frontier models into dynamic natural language workflows, improving LiveCodeBench and GPQA-Diamond performance.

@SakanaAILabsposted on X

Introducing our new work: “Learning to Orchestrate Agents in Natural Language with the Conductor” accepted at #ICLR2026 https://t.co/Wnh9ZACmLm What if we trained an AI not to solve problems directly, but to act as a manager that delegates tasks to a diverse team of other AIs? To solve complex tasks, humans rarely work alone; we form teams, delegate, and communicate. Yet, multi-agent AI systems currently rely heavily on rigid, human-designed workflows or simple routers that just pick a single model. We wanted an AI that could dynamically build its own team. We trained a 7B Conductor model using Reinforcement Learning to orchestrate a pool of frontier models (including GPT-5, Gemini, Claude, and open-source models available during the period leading up to ICLR 2026). Instead of executing code, the Conductor outputs a collaborative workflow in natural language. For any given question, the Conductor specifies: 1/ Which agent to call 2/ What specific subtask to give them (acting as an expert prompt engineer) 3/ What previous messages they can see in their context window Through pure end-to-end reward maximization, amazing behaviors emerged. The Conductor learned to adapt to task difficulty: it 1-shots simple factual questions, but autonomously spins up complex planner-executor-verifier pipelines for hard coding problems. The results are very promising: The 7B Conductor surpasses the performance of every individual worker model in its pool, setting new records on LiveCodeBench (83.9%) and GPQA-Diamond (87.5%) at the time of publication. It also significantly outperforms expensive multi-agent baselines like Mixture-of-Agents at a fraction of the cost. One of our favorite features: Recursive Test-Time Scaling! By allowing the Conductor to select itself as a worker, it reads its own team's prior output, realizes if it failed, and spins up a corrective workflow on the fly. This opens a new axis for scaling compute during inference. This research proves that language models can become elite meta-prompt engineers, dynamically harnessing collective intelligence. Alongside our TRINITY research which we announced a few days earlier, this foundational research powers our new multi-agent system: Sakana Fugu! (https://t.co/36Ud311KCP) 🐡 OpenReview: https://t.co/e5WqTleQNL (ICLR 2026)

View original tweet on X →

Community Sentiment Analysis

Real-time analysis of public opinion and engagement

Sentiment Distribution

90% Engaged
70% Positive
20% Negative
Positive
70%
Negative
20%
Neutral
10%

Key Takeaways

What the community is saying — both sides

Supporting

1

Manager-first paradigm:

many replies argue that a manager model > worker model — putting effort into management/orchestration yields higher leverage than improving individual solvers.

2

Routing → orchestration

the key shift is from picking a model (a lookup) to coordinating a sequence of models with information passing; doing that coordination in natural language makes the protocol human-readable and debuggable, which aids auditing and intervention.

3

Manage, don’t just solve:

training a model to act as a manager (high-level delegation and coordination of specialists) is praised as an effective way to handle complexity, mirroring how successful human teams operate.

4

Enthusiastic endorsement:

short replies — “Congrats,” “That’s huge,” “Very cool” — signal broad positive reception and community validation of the approach.

Opposing

1

Release the weights and training data

The project shouldn't keep the model and its training data secret when it builds on and benefits from the community's work (e.g., Hugging Face); users expect reproducibility and fair sharing.

2

Performance isn't proven

The paper hasn't demonstrated the model is meaningfully better than a simple lookup table; critics want clear benchmarks and ablation evidence before accepting claims.

Top Reactions

Most popular replies, ranked by engagement

S

@SakanaAILabs

Supporting

Learning to Orchestrate Agents in Natural Language with the Conductor Paper: https://t.co/31QhVGCSzq OpenReview: https://t.co/QBzmnGuvif Fugu Blog: https://t.co/TFDdOQzSqJ 🐡

42
1
4.3K
X

@Xenea_io

Supporting

manager model > worker model. management has always been the highest leverage role, even in AI apparently

11
1
828
L

@LenSeaside

Opposing

Please don't tell me you aren't releasing the weights or the training data for the conductor!? Think of all everybody does and puts on huggingface.and you think this small model is your little golden egg secret?

6
1
638
L

@LenSeaside

Opposing

I've been through the whole paper and you literally haven't even shown it's better than a lookup table.

3
0
231
I

@inflectivAI

Supporting

Training a model to act as a manager rather than a solver is a brilliant approach to handling complexity. It mimics how successful human teams operate by focusing on the high-level delegation and coordination of specialized experts.

1
0
249

This article was AI-generated from real-time signals discovered by PureFeed.

PureFeed scans X/Twitter 24/7 and turns the noise into actionable intelligence. Create your own signals and get a personalized feed of what actually matters.

Report an Issue

Found something wrong with this article? Let us know and we'll look into it.