Nathan S. de Lara

Toronto, Ontario

nathan[lastname]1@gmail.com

Hi, I’m Nathan, an incoming PhD student at Carnegie Mellon University hoping to work on off-policy learning. At the moment I’m spending the summer interning at Mistral in San Francisco on the Pre-Training team.

Previously, I did my Master of Science (MSc) at the University of Toronto in the Robot Vision and Learning Lab Supervised by Prof. Florian Shkurti. Before that, I spent 4 great years at McGill University where I was fortunate to work with Prof. Doina Precup and Prof. Russell Steele.

My main driving question during my research: How can RL scale better than BC for Pre-Training?

Recent works suggest RL should outperform BC when trained on large noisy datasets containing suboptimal demonstrations. This setting bears a strong resemblance to the majority of internet data out there. Yet, RL has failed to be used in favour of BC for large-scale pre-training. My research goal is to help get RL to a place where it reliably surpasses BC.

News

Jul 04, 2026	Our paper SMAC: Score-Matched Actor-Critics for Robust Offline-to-Online Transfer was accepted to ICML 2026! 🎊
Sep 17, 2025	Our paper STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation was accepted to NeurIPS 2025 as a spotlight! 🎉
Sep 01, 2024	Started my Masters of Science at the University of Toronto with Prof. Florian Shkurti
Aug 05, 2024	Presented work on the representation collapse experience by recurrent networks when applied Continual Reinforcement Learning at the Can’t Believe It’s Not Better Workshop: Failure Modes of Sequential Decision-Making in Practice hosted at RLC
Apr 15, 2024	Graduated from McGill University with a Bachelor of Arts in the Honours Mathematics and Computer Science Program!

Selected Publications

SMAC: Score-Matched Actor-Critics for Robust Offline-to-Online Transfer

Nathan Samuel Lara and Florian Shkurti

2026
STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation

Hossein Goli, Michael Gimelfarb, Nathan Samuel Lara, and 3 more authors

2025
Recurrent Policies Are Not Enough for Continual Reinforcement Learning

Nathan Samuel Lara, Veronica Chelu, and Doina Precup

In I Can’t Believe It’s Not Better Workshop: Failure Modes of Sequential Decision-Making in Practice (RLC 2024), 2024
Towards safe mechanical ventilation treatment using deep offline reinforcement learning

Flemming Kondrup, Thomas Jiralerspong, Elaine Lau, and 5 more authors

In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence, 2023

Abs DOI

Mechanical ventilation is a key form of life support for patients with pulmonary impairment. Healthcare workers are required to continuously adjust ventilator settings for each patient, a challenging and time consuming task. Hence, it would be beneficial to develop an automated decision support tool to optimize ventilation treatment. We present Deep-Vent, a Conservative Q-Learning (CQL) based offline Deep Reinforcement Learning (DRL) agent that learns to predict the optimal ventilator parameters for a patient to promote 90 day survival. We design a clinically relevant intermediate reward that encourages continuous improvement of the patient vitals as well as addresses the challenge of sparse reward in RL. We find that DeepVent recommends ventilation parameters within safe ranges, as outlined in recent clinical trials. The CQL algorithm offers additional safety by mitigating the overestimation of the value estimates of out-of-distribution states/actions. We evaluate our agent using Fitted Q Evaluation (FQE) and demonstrate that it outperforms physicians from the MIMIC-III dataset.