Jaival Patel (@patjaival) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

after 3 months of continuous crashing, i finally got rl to land a rocket by itself! yes, the complete 6dof dynamics: translation + rotation, variable mass, tvc, disturbances, all of it done by the rl itself. the core issue is that landing is a constrained braking problem, not open-ended control. rl fails because the feasible solution manifold is extremely narrow. once the search space was shaped properly, rl converged. i tried various rl policies and architectures to figure all this out. full technical analysis here check it out!: jaivalpatel.com/research-work.…

English

2

1

19

1.3K

Jaival Patel retweetledi

X Freeze@XFreeze·20h

NASA just officially unveiled their master plan for a permanent Moon Base at the lunar South Pole This is not just about flags and footprints. NASA is moving to establish an enduring, sustained human presence, and they are heavily relying on commercial innovators to build it The roadmap is highly aggressive: • Phase 1: Heavy robotic missions and commercial payload deliveries • Phase 2: Semi-permanent infrastructure, including fission surface power and lunar drones • Phase 3: A sustained, permanent human outpost The most important takeaway is NASA explicitly stated this base is the ultimate proving ground to prepare humanity for missions to Mars While legacy aerospace companies are still struggling to reliably get a small capsule to the ISS, NASA is setting the stage for massive lunar infrastructure....which is exactly the kind of heavy-lift planetary deployment SpaceX’s Starship was designed for The multi-planetary economy is officially kicking off

English

789

1.7K

10.4K

14.9M

Jaival Patel@patjaival·2d

in aero, control problems rarely give you clean dynamics and unlimited compute. reentry is a great example: nonlinear dynamics, changing aero effects, tight constraints, and little margin of error (close to none actually). that’s when learning-augmented MPC sparked my interest. MPC gives structure and constraint handling. learning can help when the model is incomplete. hence, for the coming weeks, i'll be looking into combining these ideas: learning-augmented MPC with reentry attitude control. very excited to see what i come up with!

English

0

31

Jaival Patel retweetledi

Zelda@zeldapoem·3d

Pinch me, I can't believe someone wrote about lab notebooks. Unbelievably cool

English

26

752

8.4K

213.6K

Jaival Patel@patjaival·20 May

flushed out this thought a lot more. realized that it's more of a mental framework rework and defining a proper control system more than anything. check it out here: jaivalpatel.com/research-notes…

Jaival Patel@patjaival

hot take, aerospace can grow quicker if we trusted reinforcement learning more in the domain. there are many ways we can impose safety-critical RL: - control barrier functions: keeping the policy inside a mathematically safe region (surrogate model?). RL agent can optimize performance, but a barrier layer can block actions that violate constraints. - run time safety filters - shielded RL: a policy that proposes actions, but a classical controller or rule-based shield that overrides unsafe actions. - hybrid RL + classical control (what i worked on last month) - uncertainty-aware policies: using ensembles, Bayesian models, or confidence estimates so the system knows when it is uncertain. (im looking into this right now for commercial robotic systems and it is giving me good results so far) in the end, RL doesn't have to replace classical GNC. it simply just needs to sit inside the constraints - it's RL inside safety-critical control architecture.

English

0

1

80

Jaival Patel@patjaival·20 May

@lawrencefeng17 curious if this extends to physical RL. if a model is pretrained early on rich dynamics priors like contact, friction, actuator limits, etc, does that improve retention and robustness of the physical environment and its variables? i would assume so

English

1

0

1

193

Lawrence Feng@lawrencefeng17·19 May

1/ To retain post-training capabilities after further fine-tuning, mix that data into pretraining. The effect can be invisible until fine-tuning begins; early exposure may not help post-training performance, but it changes what persists. How a model learns a task matters.

English

6

24

86

26.5K

Jaival Patel@patjaival·20 May

@virkvarjun @bracketbot @sincethestudy interesting research scope. good luck!!

English

1

0

1

86

Arjun Virk@virkvarjun·19 May

Life Update: I've moved to SF to build the future of robotics learning @bracketbot with @sincethestudy. My research focuses on unlocking continual learning for robotics policies. More soon.

English

17

3

131

8K

Jaival Patel@patjaival·20 May

@yacineMTB 🙋‍♂️

QME

0

6

kache@yacineMTB·19 May

Am I the only person in the world working on robotics instead of large language models

English

267

16

1.3K

59.1K

Jaival Patel@patjaival·18 May

built a complete compiler toolkit for turning high-level control policies into reliable, testable control systems. brings together typed policy frontends, explicit compiler contracts, and backend-ready architecture. link: jaivalpatel.com/research-work.…

Jaival Patel@patjaival

js read this paper and imo a big gap in controls eng is that we design a MPC, LQR, or RL policy, then end up wasting a lot of time fighting matlab/simulink codegen. bloated output, manual simplifications, sensitive constraints, of which none of them lead to visibility into embedded latency/memory of the system. what we actually need is a proper compiler: one CLI that takes a control law and outputs an optimized C/Rust script for embedded targets. im thinking that each script should have symbolic simplification, auto-linearization, solid constraint handling, and built-in latency benchmarking. (arxiv.org/abs/2103.1457)

English

0

2

85

Jaival Patel retweetledi

altan tutar@altantutar·13 May

This is insane! Actor Labs fine-tuned π0.5 (physical intelligence's flagship VLA model) and deployed it on a real excavator. They just raised $4M led by Eniac Ventures, with Hyperion, Hummingbird, 2048 Ventures, and Nova Global. From founder @laneburgett: "We've collected a massive corpus of real-world data with natural language labels from operators in the industry and are using it to create some really cool policies. Here's our first demo of it successfully completing a task with just 200 trajectories."

English

7

24

186

15.7K

Jaival Patel@patjaival·15 May

building a 7dof robot that gets sent to space in two months was not on my summer bucket-list but oh well

English

0

3

133

Jaival Patel@patjaival·13 May

@dhruvr_43 LOL good luck bro my prayers are w u for that course. it fried me 😭

English

0

1

12

dhruvr_43@dhruvr_43·13 May

@patjaival Taking analog control systems rn to be able to understand this big brain shit

English

1

0

2

52

Jaival Patel@patjaival·13 May

hot take, aerospace can grow quicker if we trusted reinforcement learning more in the domain. there are many ways we can impose safety-critical RL: - control barrier functions: keeping the policy inside a mathematically safe region (surrogate model?). RL agent can optimize performance, but a barrier layer can block actions that violate constraints. - run time safety filters - shielded RL: a policy that proposes actions, but a classical controller or rule-based shield that overrides unsafe actions. - hybrid RL + classical control (what i worked on last month) - uncertainty-aware policies: using ensembles, Bayesian models, or confidence estimates so the system knows when it is uncertain. (im looking into this right now for commercial robotic systems and it is giving me good results so far) in the end, RL doesn't have to replace classical GNC. it simply just needs to sit inside the constraints - it's RL inside safety-critical control architecture.

Jaival Patel@patjaival

after 3 months of continuous crashing, i finally got rl to land a rocket by itself! yes, the complete 6dof dynamics: translation + rotation, variable mass, tvc, disturbances, all of it done by the rl itself. the core issue is that landing is a constrained braking problem, not open-ended control. rl fails because the feasible solution manifold is extremely narrow. once the search space was shaped properly, rl converged. i tried various rl policies and architectures to figure all this out. full technical analysis here check it out!: jaivalpatel.com/research-work.…

English

1

0

4

366

Jaival Patel@patjaival·9 May

being the only intern on ur team has to be exciting, motivating, and scary all at the same time

English

0

6

138

Jaival Patel@patjaival·6 May

@sakshambatraa @michael_trbo fire

English

0

1

68

saksham@sakshambatraa·6 May

reinventing Groq's LPU with @michael_trbo we got instruction driven data movement working between SRAM memory blocks and MXM compute!!

English

11

17

75

6.3K

Jaival Patel@patjaival·6 May

@rory_mg nordspace truly redefining the canadian aerospace sector. we love it

English

1

0

1

131

rory 🍁@rory_mg·5 May

why launch one rocket from Canadian soil when you can launch three? images of the Hadfield engine coming shortly...

English

6

20

175

8.4K

Jaival Patel@patjaival·6 May

@adithya_s_k holy fire work

English

0

1

146

Adithya S K@adithya_s_k·5 May

Excited to release the Ultimate guide to RL environments! Definitions of RL environments differ wildly in the LLM era, so we spent the last month building several RL environments across 6 different frameworks, domains and complexities to map out which are easiest to build with and which can be scaled to 1000s.

English

51

158

1.2K

222.9K

Jaival Patel

Keşfet