
Working on something closer to scientific computing and AI for science was a refreshing change of scene from the usual LLM grind. And we are excited to introduce our #AISTATS2026 paper, RLMesh, which uses RL to adaptively select mesh points for training neural PDE surrogates. The backstory: Partial differential equations (PDEs) govern an enormous range of physical phenomena — fluid dynamics, heat transfer, subsurface flow, weather systems. Solving them numerically is foundational across science and engineering, but also expensive: classical solvers discretize the domain onto a fine mesh, and the cost scales steeply with resolution. This is why neural surrogates like Fourier Neural Operators (FNOs) pioneered by @zongyili_nyu have been so exciting — they learn the solution operator directly from data and can produce predictions dramatically faster at inference time. However, training these surrogates requires thousands of expensive solver runs on fine grids, and the standard approach trains on the full solution field for every instance, even in regions where the solution is smooth and carries little learning signal. So our key motivation was: what if the surrogate doesn't need to train on the full solution field, and what if we could learn which points to train on? We frame mesh point selection as a reinforcement learning (RL) problem. For each PDE instance, an RL agent sequentially picks a small budget of grid locations where the solver should be queried. And the policy learns to place points where they actually matter: - For Burgers' equation, it learns to focus on shock fronts; - For Darcy flow, it learns to concentrate along high-contrast permeability channels and boundary layers. In practice, the RL policy takes the PDE input and sequentially selects mesh points, the solver provides solutions at those sparse locations, and the FNO trains on the accumulated non-uniform data. One design choice worth highlighting is that we can't retrain the FNO after every RL episode to compute a reward signal — that would make the training loop impossibly slow. So we use a kernel ridge regression proxy instead — it retrains in under a second while maintaining ~0.99 correlation with actual FNO error trends. Across Burgers (shock dynamics), Darcy flow (discontinuous coefficients), and Lorenz-96 (chaotic lattice dynamical system), RLMesh consistently outperforms all heuristic baselines (uniform, random, gradient-based, variance-based, intensity-based) under identical query budgets. On Burgers, we reach a given accuracy with roughly 33–50% fewer solver calls. In wall-clock simulation time the gap widens further — ~40s to reach an RMSE that baselines need 80–150s+ for. (In our main accuracy curves, we use an oracle uniform-grid solver to isolate the effect of point selection; for wall-clock simulation time, we use a non-uniform solver to reflect the real savings.) One thing I find conceptually appealing is: prior active learning work for neural PDE surrogates focuses on which instances to simulate, which is always on a full grid. We're asking a complementary question from an orthogonal axis — where within each instance to query? In principle, combining both could push data efficiency even further. On a personal note, I really enjoyed this collaboration with Yang Meng (@justinmeng19), Yuxin's (@yuxinch) new PhD student who drove much of the effort, and the rest of the team @propitious1235 @ChongLiuCS @WillettBecca. Find out more: arxiv.org/pdf/2603.02066
















