We introduce Factored Scaling Curves (FSC), which model how policy performance scales with data for different environmental factors and can be extrapolated to guide principled data collection.
Generalist imitation-learning policies trained on large datasets show great promise for diverse manipulation tasks. However, achieving robust generalization requires data gathered under numerous environmental-factor variations (e.g., camera pose, table height, distractors) — an exhaustive dataset is prohibitively expensive. We introduce a principled method for deciding what data to collect and how much to collect for each factor by constructing factored scaling curves (FSC), which quantify how policy performance changes as data scales along individual or paired factors. These curves enable targeted data acquisition for the most influential factor combinations within a given budget. Through extensive simulated and real-world experiments — covering both training-from-scratch and fine-tuning scenarios — we show that FSC boosts success rates on real-world tasks in new environments by up to 26% compared with existing data-collection strategies. Finally, we demonstrate that FSC can efficiently guide data collection using an offline metric, eliminating the need for costly large-scale real-world evaluations.
We evaluate our method on three challenging real-world tasks: Put Mouse in Drawer, Fold Towel, and Put Tomato in Plate. Six environment factors are considered: camera pose, lighting, distractor objects, table texture, object pose, and robot initial pose.
We test FSC in both training-from-scratch settings (using Diffusion Policy) and fine-tuning settings (using π0). FSC solves these tasks even under compounded environment-factor variations, outperforming baselines by up to 26%. FSC-Proxy achieves nearly the same high success rate as FSC while eliminating the need for any on-hardware policy execution.
We consider eight environment factors: camera pose, lighting, distractor objects, table texture, background, object pose, robot initial pose, and table height.
@misc{zha2025guidingdatacollectionfactored,
title={Guiding Data Collection via Factored Scaling Curves},
author={Lihan Zha and Apurva Badithela and Michael Zhang and Justin Lidard and Jeremy Bao and Emily Zhou and David Snyder and Allen Z. Ren and Dhruv Shah and Anirudha Majumdar},
year={2025},
eprint={2505.07728},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2505.07728},
}