

Jaival Patel
146 posts

@patjaival
robotics @MDA_Space, engsci @UofT






hot take, aerospace can grow quicker if we trusted reinforcement learning more in the domain. there are many ways we can impose safety-critical RL: - control barrier functions: keeping the policy inside a mathematically safe region (surrogate model?). RL agent can optimize performance, but a barrier layer can block actions that violate constraints. - run time safety filters - shielded RL: a policy that proposes actions, but a classical controller or rule-based shield that overrides unsafe actions. - hybrid RL + classical control (what i worked on last month) - uncertainty-aware policies: using ensembles, Bayesian models, or confidence estimates so the system knows when it is uncertain. (im looking into this right now for commercial robotic systems and it is giving me good results so far) in the end, RL doesn't have to replace classical GNC. it simply just needs to sit inside the constraints - it's RL inside safety-critical control architecture.




js read this paper and imo a big gap in controls eng is that we design a MPC, LQR, or RL policy, then end up wasting a lot of time fighting matlab/simulink codegen. bloated output, manual simplifications, sensitive constraints, of which none of them lead to visibility into embedded latency/memory of the system. what we actually need is a proper compiler: one CLI that takes a control law and outputs an optimized C/Rust script for embedded targets. im thinking that each script should have symbolic simplification, auto-linearization, solid constraint handling, and built-in latency benchmarking. (arxiv.org/abs/2103.1457)




after 3 months of continuous crashing, i finally got rl to land a rocket by itself! yes, the complete 6dof dynamics: translation + rotation, variable mass, tvc, disturbances, all of it done by the rl itself. the core issue is that landing is a constrained braking problem, not open-ended control. rl fails because the feasible solution manifold is extremely narrow. once the search space was shaped properly, rl converged. i tried various rl policies and architectures to figure all this out. full technical analysis here check it out!: jaivalpatel.com/research-work.…


