Raktim Gautam Goswami

54 posts

Raktim Gautam Goswami

Raktim Gautam Goswami

@raktimgg

PhD Student @nyuniversity | Prev @AIatMeta | Electrical Engineer from @IITHyderabad

Katılım Kasım 2019
154 Takip Edilen180 Takipçiler
Raktim Gautam Goswami retweetledi
Basile Terver
Basile Terver@BasileTerv987·
𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗘𝗕-𝗝𝗘𝗣𝗔 ⚡ An open-source library making JEPAs accessible, trainable on a single GPU in hours! 🚀 🔗 Paper: arxiv.org/abs/2602.03604 💻 Code: github.com/facebookresear…
Basile Terver tweet media
English
13
97
663
91.2K
Raktim Gautam Goswami
Raktim Gautam Goswami@raktimgg·
@DavidJFan @AIatMeta @ylecun Thank you, David! It was great working with you as well. I really appreciated all your support and guidance. I learned a lot working together. And yes, we’ll definitely have to keep the tradition going and try out some new restaurants next time 😄
English
0
0
2
37
David Fan
David Fan@DavidJFan·
@raktimgg @AIatMeta @ylecun Congrats Raktim!! It was a pleasure to have you here in the NYC office 😁 I learned so much from you about robotics and was really impressed with your attention to detail and persistence in getting things to work. Also enjoyed having some authentic Indian food :)
English
1
0
5
129
Raktim Gautam Goswami
Raktim Gautam Goswami@raktimgg·
Just wrapped up my 7-month internship at Meta FAIR @AIatMeta. A deeply valuable experience, and I’m thankful for everything I learned along the way. During the internship, I worked on world models for dexterous manipulation under the guidance of @ylecun . (1/4)
Raktim Gautam Goswami tweet mediaRaktim Gautam Goswami tweet mediaRaktim Gautam Goswami tweet media
English
3
0
16
545
Raktim Gautam Goswami
Raktim Gautam Goswami@raktimgg·
Looking forward to continuing research in this domain and hopefully collaborating again in the future. Grateful for the journey and excited about what’s next. (4/4)
English
0
0
0
81
Raktim Gautam Goswami
Raktim Gautam Goswami@raktimgg·
As 2025 comes to an end, I feel thankful for these opportunities and humbled by how much there is still to learn. I’m excited to continue this journey in 2026 and hopefully contribute further to the AI and robotics community. (4/4)
Raktim Gautam Goswami tweet mediaRaktim Gautam Goswami tweet media
English
0
0
1
100
Raktim Gautam Goswami
Raktim Gautam Goswami@raktimgg·
None of this would have been possible without my co-authors, collaborators, and mentors; I’m deeply grateful for their support. I’m also thankful for the opportunity to intern at Meta-FAIR this past year; will share more about it in a separate post. (3/4)
English
1
0
2
131
Raktim Gautam Goswami
Raktim Gautam Goswami@raktimgg·
Concluding 2025 on a grateful note. This past year, I had the privilege of presenting my work at top AI and robotics conferences: NeurIPS, CVPR, ICRA, IROS, and WACV. From robot parades to cutting-edge AI talks, each conference offered invaluable learning and inspiration. (1/4)
Raktim Gautam Goswami tweet media
English
1
0
3
371
Raktim Gautam Goswami
Raktim Gautam Goswami@raktimgg·
@tomchapin We haven’t released the code yet, which is why the link isn’t working. We’re working on getting it ready and hope to share it soon.
English
0
0
1
33
Raktim Gautam Goswami
Raktim Gautam Goswami@raktimgg·
@AlexiGlad It can indeed be expensive if we use too many frames. But in practice, we can cover a relatively long time span with small number of frames by using a low sampling freq. For eg, in our work we sample 8 frames from video sequences of ~4 seconds, and this already performs well.
English
0
0
1
24
Alexi Gladstone
Alexi Gladstone@AlexiGlad·
@raktimgg got it, isnt that super expensive then? meaning your world model likely cannot span too many frames?
English
1
0
1
45
Raktim Gautam Goswami
Raktim Gautam Goswami@raktimgg·
@AlexiGlad The sentence "observations ... features" refers to the diffusion policy baseline, where we only use average-pooled DINOv2 features. For DexWM, we use all patch tokens similar to DINO-WM. In my experience, pooled features work worse than full patch tokens for such world models.
English
1
0
1
51
Alexi Gladstone
Alexi Gladstone@AlexiGlad·
"observations are encoded using the average-pooled DINOv2 patch features" just to clarify, are you guys predicting the average patch features, meaning you predict one patch per frame? did you ever try predicting the cls token instead of the average-pooled features? did you ever have issues because of only predicting a single average pooled feature rather than all the patch tokens like dinoWM?
English
1
0
1
57
Raktim Gautam Goswami
Raktim Gautam Goswami@raktimgg·
Paper Title: World Models Can Leverage Human Videos for Dexterous Manipulation
English
1
0
2
892