Scientific discovery often operates in a regime where evaluations are expensive, data are limited, and useful interventions must remain close to known good candidates. In this talk, I present a matched-data approach to molecular property enhancement that is naturally connected to reinforcement learning. The key idea is to construct local pairs in which one molecule is both nearby and better than another, and to train a model to learn these local improving moves. Iterating this operator yields a practical strategy for lead optimization.I will argue that this method is best viewed as a critic-free, offline, local policy-improvement procedure rather than full RL. This framing clarifies its relationship to supervised fine-tuning and direct preference optimization: all three methods learn from paired data, but matched training uses locality as an additional inductive bias, allowing each pair to convey directional information about how to improve. I will also discuss extensions based on generative modeling over matched datasets, self-training, and robust out-of-distribution generalization, and conclude with opportunities for active data collection and uncertainty-aware planning in scientific design.