Amlan Kar

557 posts

Amlan Kar

Amlan Kar

@amlankar95

Researcher @NVIDIAAI Spatial Intelligence Lab. Computer Vision PhD @UofT/@VectorInst. Previously @IITKanpur. I like data. Opinions here are all mine.

Toronto, Ontario Katılım Mart 2018
845 Takip Edilen1.3K Takipçiler
Amlan Kar retweetledi
Ruilong Li
Ruilong Li@ruilong_li·
Special moment to see something I’ve worked on so closely come to life! Today we announce Alpadreams — a world model that lets you explore ♾endlessly♾️in ⚡real time⚡. Video: me (left) and Alpamayo policy (right) driving in Alpadreams at #GTC26. research.nvidia.com/labs/sil/proje…
English
2
19
98
9.9K
Amlan Kar retweetledi
Zan Gojcic
Zan Gojcic@ZGojcic·
A new generation in AV simulation is here! We are announcing AlpaDreams, a real time interactive generative world model for AV simualtion! Just a year ago it took minutes to generate a few seconds of video, today it is real time and interactive! research.nvidia.com/labs/sil/proje…
English
5
26
103
17.3K
Amlan Kar retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)
Andrej Karpathy tweet media
English
1.1K
3.7K
28.3K
10.9M
Amlan Kar retweetledi
Jon Barron
Jon Barron@jon_barron·
@gkopanas I love that review. I do genuinely think a great way to evaluate research contributions would be to add the new paper to an agent's context window and see what delta the agent can get on some OSS codebase's performance.
English
0
1
14
1.9K
Amlan Kar retweetledi
Sven Elflein
Sven Elflein@s_elflein·
🚀 Exciting news! We’re introducing VGG-T³: a scalable model for offline feed-forward 3D reconstruction that finally tackles the "quadratic bottleneck." Ever wanted to have VGGT reconstruct a 1,000-image scene in seconds instead of 10 minutes and use it for visual localization?
GIF
English
7
69
468
32.4K
Amlan Kar retweetledi
Xindi Wu
Xindi Wu@cindy_x_wu·
New #NVIDIA Paper We introduce Motive, a motion-centric, gradient-based data attribution method that traces which training videos help or hurt video generation. By isolating temporal dynamics from static appearance, Motive identifies which training videos shape motion in video generation. 🔗 research.nvidia.com/labs/sil/proje… 1/10
English
11
112
542
73.2K
Amlan Kar retweetledi
Or Litany
Or Litany@orlitany·
🚗📡Radar is the unsung hero of AV perception: widespread in cars, yet overlooked in simulation. Introducing RadarGen: Realistic radar synthesis from cameras using diffusion. Massive kudos to my fantastic team at @TechnionLive and @NVIDIAAI radargen.github.io
Tomer Borreda@TomerBorreda

📢 RadarGen: Automotive Radar Point Cloud Generation from Cameras Can we generate realistic radar point clouds solely from camera images? 🚗📡 We introduce RadarGen, a diffusion-based framework that synthesizes radar returns aligned with visual scenes. radargen.github.io

English
1
9
37
5.2K
Amlan Kar retweetledi
Jack Zhang
Jack Zhang@jackzzhang·
Can we apply gradient descent to discrete changes? In our new #SIGGRAPHAsia paper, we show that gradient descent can work on shape grammars, as in CAD and procedural modeling, but only if the grammars are designed correctly!
English
6
43
262
64.2K
Amlan Kar retweetledi
Amlan Kar retweetledi
Jack Merullo
Jack Merullo@jack_merullo_·
How is memorized data stored in a model? We disentangle MLP weights in LMs and ViTs into rank-1 components based on their curvature in the loss, and find representational signatures of both generalizing structure and memorized training data
Jack Merullo tweet media
English
8
61
508
46.6K
Amlan Kar retweetledi
Rob Wiblin
Rob Wiblin@robertwiblin·
I don't think that greater attention to the 'political economy' of superintelligence is going to make ordinary people feel better about pushing ahead @tylercowen. The most salient effect of superintelligence (plus vast numbers of robots) is that most people lose political and economic leverage. Governments and owners of capital are freed up to ignore them at much reduced economic or military cost. The people who are sincerely trying to figure out a way to maintain an pluralistic social equilibrium with power fairly-widely distributed in the presence of ever-improving superintelligent machines seem to have few positive results to report thus far. Whether you even face more of a threat from your own government before or after such a ban seems at best unclear. If you think parts of your government are indifferent to you, then they can always nationalise and use superior access to superintelligent machines against you after a private company develops them (if the company doesn't do so first). Playing for time with a ban might well be the best of a bad set of options for a random person with few savings and little faith in businesses, or politicians, or broader political economic forces, not to simply bulldoze them. All-in-all an outstanding topic for a paper, but the severe challenge we face avoiding a dangerous concentration of power should be treated equally seriously along all the different possible paths. (Links to: x.com/deanwball/stat… )
Rob Wiblin tweet media
English
6
10
112
18.7K
Amlan Kar retweetledi
Hannes Stark
Hannes Stark@HannesStaerk·
Excited to release BoltzGen which brings SOTA folding performance to binder design! The best part of this project has been collaborating with many leading biologists who tested BoltzGen at an unprecedented scale, showing success on many novel targets and pushing its limits! 🧵..
Hannes Stark tweet media
English
18
262
991
299.4K
Amlan Kar retweetledi
Phillip Isola
Phillip Isola@phillip_isola·
Over the past year, my lab has been working on fleshing out theory/applications of the Platonic Representation Hypothesis. Today I want to share two new works on this topic: Eliciting higher alignment: arxiv.org/abs/2510.02425 Unpaired rep learning: arxiv.org/abs/2510.08492 1/9
English
10
119
695
67.1K
Amlan Kar retweetledi
Shubham Tulsiani
Shubham Tulsiani@shubhtuls·
[1/N] We present a plug-and-play mechanism to controllably steer inference of any diffusion/flow model towards a sharper or flatter sampling distribution, resulting in improvements across domains e.g. text-to-image (10% FID reduction), protein generation (improved designability).
English
6
99
946
60.6K
Amlan Kar retweetledi
Sherwin Bahmani
Sherwin Bahmani@sherwinbahmani·
📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: research.nvidia.com/labs/toronto-a… Code and Models: github.com/nv-tlabs/lyra Paper: arxiv.org/abs/2509.19296
English
20
65
258
65.5K
Amlan Kar retweetledi
Hezhen Hu @ CVPR2026
Hezhen Hu @ CVPR2026@AlexHu0212·
AI3DCC Workshop @ICCVConference We are excited to announce that the 3rd International Workshop on AI for 3D Content Creation (AI3DCC) will take place on October 20th, 2025 (8:00–12:30) in conjunction with ICCV 2025, Honolulu. ✨This year, we are honored to have five distinguished keynote speakers from both academia and industry. 🖼️We will also host an interactive poster session, offering students and researchers the opportunity to present their latest work and engage with the community. Self-nominations for posters are welcome. forms.gle/P3YUEVFWYV7xFU… 📄 Learn more at: ai3dcc.github.io
Hezhen Hu @ CVPR2026 tweet media
English
3
8
32
5.6K
Amlan Kar retweetledi
Ruofan Liang
Ruofan Liang@RfLiang·
💡 Introducing LuxDiT: a diffusion transformer (DiT) that estimates realistic scene lighting from a single image or video. It produces accurate HDR environment maps, addressing a long-standing challenge in computer vision. 🔗Paper: arxiv.org/abs/2509.03680
English
3
55
271
20.1K
Amlan Kar retweetledi
Yue Wang
Yue Wang@yuewang314·
🚀 Join Us: Research Internships in Embodied Intelligence The USC Geometry, Vision, and Learning Lab (usc-gvl.github.io) is seeking highly motivated interns to push the frontiers of AI, robotics, and 3D computer vision. You’ll work on large-scale VLA models, hardware–software co-design for robotic data collection, humanoids, and cutting-edge 3D computer vision research. 🔍 Research Areas Robot Learning — Large-scale algorithm training for embodied agents Hardware–Software Co-Design — Building next-gen robotic sensing and actuation platforms 3D Reconstruction & Perception — Neural scene representations, SLAM, and generative 3D modeling Deep Learning at Scale — Vision-language-action model development and optimization 🛠 Desired Expertise We welcome candidates with experience in one or more of the following: Robot learning algorithm development & training pipelines Hardware design for robotic platforms and sensor integration 3D reconstruction, NeRFs, and geometric deep learning Large-scale deep learning (PyTorch, JAX, distributed training) Computer vision & multimodal learning (images, videos, language, actions) 🌟 What You’ll Do Design and train large VLAs for robotic decision-making Develop novel hardware–software systems for efficient, high-quality robotic data collection Implement and benchmark state-of-the-art 3D perception and reconstruction algorithms Collaborate with a multidisciplinary team spanning AI, robotics, and computer vision 📍Commitment Duration: >= 3 month Weekly commitment: >= 20 hours, ideally 40 hours Start Date: 09/2025 📩 How to Apply USC Students: Apply here forms.gle/YCqXRF3wnksNCw… Non-USC Applicants: Apply here forms.gle/wLmPS3bZNGPtSX…
English
7
24
200
23.4K
Amlan Kar retweetledi
Jiahui Huang
Jiahui Huang@huangjh_hjh·
[1/N] 🎥 We've made available a powerful spatial AI tool named ViPE: Video Pose Engine, to recover camera motion, intrinsics, and dense metric depth from casual videos! Running at 3–5 FPS, ViPE handles cinematic shots, dashcams, and even 360° panoramas. 🔗 research.nvidia.com/labs/toronto-a…
English
13
100
450
61.9K