Debate2Create: Robot Co-design via Multi-Agent LLM Debate

Kevin Qiu 1,2 Marek Cygan 1,3
1 University of Warsaw 2 IDEAS NCBR 3 Nomagic
ICML 2026
Overview of the Debate2Create framework

Overview of the Debate2Create framework. Design and control agents debate robot morphologies and rewards, physics-based evaluation scores each candidate, and optional pluralistic judges provide multi-objective critiques for later rounds.

TL;DR: Debate2Create uses multi-agent LLM debate to produce MuJoCo morphologies and reward functions; candidates are trained with RL in Brax/MuJoCo and ranked by simulator return rather than LLM preference.

Abstract

We introduce Debate2Create (D2C), a multi-agent LLM framework that formulates robot co-design as structured, iterative debate grounded in physics-based evaluation: each morphology-reward pair is trained with reinforcement learning and scored by simulator return. A design agent and control agent engage in a thesis-antithesis-synthesis loop, while optional pluralistic LLM judges provide multi-objective feedback to steer exploration. Across five MuJoCo locomotion benchmarks, D2C achieves up to 3.2x the default Ant score and ~9x on Swimmer, outperforming prior LLM-based methods and black-box optimization. Iterative debate yields 18-35% gains over compute-matched zero-shot generation, and D2C-generated rewards transfer to default morphologies in 4/5 tasks. Our results demonstrate that structured multi-agent debate offers an effective alternative to hand-designed objectives for joint morphology-reward optimization.

Results

D2C is evaluated as a joint morphology-reward search method. Candidate rewards are used for policy training, while all methods are compared with the same external simulator score, so the reported gains measure task performance rather than generated-reward magnitude.

Overall Performance

The main comparison asks whether structured debate improves over fixed-body reward design, morphology-only search, prior LLM co-design, and the original MuJoCo designs under a shared evaluation protocol. D2C achieves the strongest default-normalized scores across the five paper benchmarks, with the largest gains on Ant and Swimmer. The paper’s ablations further show that D2C rewards improve the default morphology in 4/5 tasks, while full morphology-reward co-design gives the best score in 4/5 tasks.

Default-normalized performance across MuJoCo environments

Normalized performance across Ant, HalfCheetah, Hopper, Swimmer, and Walker2d. For each method, the best design-reward pair discovered during search is retrained and normalized by the default baseline.

Debate Dynamics

The round-by-round curves separate the effect of iterative feedback from simply sampling more candidates. With the same 80-candidate search budget, D2C continues improving after the initial proposals and outperforms compute-matched zero-shot generation in later rounds.

Best-so-far normalized scores over debate rounds

Best-so-far default-normalized scores over debate rounds. Iterative debate improves over rounds and outperforms compute-matched zero-shot generation in the paper experiments.

Morphology Comparison

The design gallery makes the co-design output concrete. Compared with the default and baseline morphologies, D2C makes task-specific edits to limb lengths, torso proportions, and joint placement rather than only changing reward code.

Representative final robot morphologies discovered by Debate2Create

Side-by-side morphology comparison across tasks and methods. The bottom row shows D2C designs discovered jointly with reward functions; the rows above show the original Default/Eureka body, Bayesian Optimization, and RoboMoRe variants from the paper comparison.

Citation

If you use Debate2Create, please cite the paper:

@inproceedings{qiu2026debate2create,
  title={Debate2Create: Robot Co-design via Multi-Agent {LLM} Debate},
  author={Qiu, Kevin and Cygan, Marek},
  booktitle={Proceedings of the International Conference on Machine Learning (ICML)},
  year={2026},
  note={To appear}
}