Amlan Kar

557 posts

Amlan Kar

@amlankar95

Researcher @NVIDIAAI Spatial Intelligence Lab. Computer Vision PhD @UofT/@VectorInst. Previously @IITKanpur. I like data. Opinions here are all mine.

Toronto, Ontario Katılım Mart 2018

845 Takip Edilen1.3K Takipçiler

Amlan Kar retweetledi

Ruilong Li@ruilong_li·17 Mar

Special moment to see something I’ve worked on so closely come to life! Today we announce Alpadreams — a world model that lets you explore ♾endlessly♾️in ⚡real time⚡. Video: me (left) and Alpamayo policy (right) driving in Alpadreams at #GTC26. research.nvidia.com/labs/sil/proje…

English

9.9K

Amlan Kar retweetledi

Zan Gojcic@ZGojcic·17 Mar

A new generation in AV simulation is here! We are announcing AlpaDreams, a real time interactive generative world model for AV simualtion! Just a year ago it took minutes to generate a few seconds of video, today it is real time and interactive! research.nvidia.com/labs/sil/proje…

English

103

17.3K

Amlan Kar retweetledi

Andrej Karpathy@karpathy·7 Mar

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

1.1K

3.7K

28.3K

10.9M

Amlan Kar retweetledi

Jon Barron@jon_barron·4 Mar

@gkopanas I love that review. I do genuinely think a great way to evaluate research contributions would be to add the new paper to an agent's context window and see what delta the agent can get on some OSS codebase's performance.

English

1.9K

Amlan Kar retweetledi

Sven Elflein@s_elflein·27 Şub

🚀 Exciting news! We’re introducing VGG-T³: a scalable model for offline feed-forward 3D reconstruction that finally tackles the "quadratic bottleneck." Ever wanted to have VGGT reconstruct a 1,000-image scene in seconds instead of 10 minutes and use it for visual localization?

GIF

English

468

32.4K

Amlan Kar retweetledi

Xindi Wu@cindy_x_wu·20 Oca

New #NVIDIA Paper We introduce Motive, a motion-centric, gradient-based data attribution method that traces which training videos help or hurt video generation. By isolating temporal dynamics from static appearance, Motive identifies which training videos shape motion in video generation. 🔗 research.nvidia.com/labs/sil/proje… 1/10

English

112

542

73.2K

Amlan Kar retweetledi

Or Litany@orlitany·22 Ara

🚗📡Radar is the unsung hero of AV perception: widespread in cars, yet overlooked in simulation. Introducing RadarGen: Realistic radar synthesis from cameras using diffusion. Massive kudos to my fantastic team at @TechnionLive and @NVIDIAAI radargen.github.io

Tomer Borreda@TomerBorreda

📢 RadarGen: Automotive Radar Point Cloud Generation from Cameras Can we generate realistic radar point clouds solely from camera images? 🚗📡 We introduce RadarGen, a diffusion-based framework that synthesizes radar returns aligned with visual scenes. radargen.github.io

English

5.2K

Amlan Kar retweetledi

Jack Zhang@jackzzhang·15 Ara

Can we apply gradient descent to discrete changes? In our new #SIGGRAPHAsia paper, we show that gradient descent can work on shape grammars, as in CAD and procedural modeling, but only if the grammars are designed correctly!

English

262

64.2K

Amlan Kar retweetledi

Or Litany@orlitany·13 Kas

Video motion and view control just became easy! Check out our new plug-and-play approach led by my brilliant students and collaborators @assaf_singer @NoamRot @mann_amir_ @RonnyKimmel @TechnionLive 🌐project page: time-to-move.github.io

Assaf Singer@assaf_singer

We present Time-to-Move (TTM)! a training-free, plug-and-play method for precise motion control in video diffusion. Unlike prior training-based methods, TTM works with any backbone at no extra cost🔥 Page: time-to-move.github.io [1/4] @NoamRot @orlitany @mann_amir_

English

14.4K

Amlan Kar retweetledi

Jack Merullo@jack_merullo_·6 Kas

How is memorized data stored in a model? We disentangle MLP weights in LMs and ViTs into rank-1 components based on their curvature in the loss, and find representational signatures of both generalizing structure and memorized training data

English

508

46.6K

Amlan Kar retweetledi

Rob Wiblin@robertwiblin·31 Eki

I don't think that greater attention to the 'political economy' of superintelligence is going to make ordinary people feel better about pushing ahead @tylercowen. The most salient effect of superintelligence (plus vast numbers of robots) is that most people lose political and economic leverage. Governments and owners of capital are freed up to ignore them at much reduced economic or military cost. The people who are sincerely trying to figure out a way to maintain an pluralistic social equilibrium with power fairly-widely distributed in the presence of ever-improving superintelligent machines seem to have few positive results to report thus far. Whether you even face more of a threat from your own government before or after such a ban seems at best unclear. If you think parts of your government are indifferent to you, then they can always nationalise and use superior access to superintelligent machines against you after a private company develops them (if the company doesn't do so first). Playing for time with a ban might well be the best of a bad set of options for a random person with few savings and little faith in businesses, or politicians, or broader political economic forces, not to simply bulldoze them. All-in-all an outstanding topic for a paper, but the severe challenge we face avoiding a dangerous concentration of power should be treated equally seriously along all the different possible paths. (Links to: x.com/deanwball/stat… )

English

112

18.7K

Amlan Kar retweetledi

Hannes Stark@HannesStaerk·27 Eki

Excited to release BoltzGen which brings SOTA folding performance to binder design! The best part of this project has been collaborating with many leading biologists who tested BoltzGen at an unprecedented scale, showing success on many novel targets and pushing its limits! 🧵..

English

262

991

299.4K

Amlan Kar retweetledi

Phillip Isola@phillip_isola·11 Eki

Over the past year, my lab has been working on fleshing out theory/applications of the Platonic Representation Hypothesis. Today I want to share two new works on this topic: Eliciting higher alignment: arxiv.org/abs/2510.02425 Unpaired rep learning: arxiv.org/abs/2510.08492 1/9

English

119

695

67.1K

Amlan Kar retweetledi

Shubham Tulsiani@shubhtuls·3 Eki

[1/N] We present a plug-and-play mechanism to controllably steer inference of any diffusion/flow model towards a sharper or flatter sampling distribution, resulting in improvements across domains e.g. text-to-image (10% FID reduction), protein generation (improved designability).

English

946

60.6K

Amlan Kar retweetledi

Sherwin Bahmani@sherwinbahmani·25 Eyl

📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a camera-conditioned video diffusion model can be transformed into a generative reconstruction model that directly outputs a high-quality 3D Gaussian Splatting representation through self-distillation, without requiring real-world training data. Check out our results in the video (wait for dynamic scenes in the second half!) : Project Page: research.nvidia.com/labs/toronto-a… Code and Models: github.com/nv-tlabs/lyra Paper: arxiv.org/abs/2509.19296

English

258

65.5K

Amlan Kar retweetledi

Hezhen Hu @ CVPR2026@AlexHu0212·24 Eyl

AI3DCC Workshop @ICCVConference We are excited to announce that the 3rd International Workshop on AI for 3D Content Creation (AI3DCC) will take place on October 20th, 2025 (8:00–12:30) in conjunction with ICCV 2025, Honolulu. ✨This year, we are honored to have five distinguished keynote speakers from both academia and industry. 🖼️We will also host an interactive poster session, offering students and researchers the opportunity to present their latest work and engage with the community. Self-nominations for posters are welcome. forms.gle/P3YUEVFWYV7xFU… 📄 Learn more at: ai3dcc.github.io

English

5.6K

Amlan Kar retweetledi

Ruofan Liang@RfLiang·5 Eyl

💡 Introducing LuxDiT: a diffusion transformer (DiT) that estimates realistic scene lighting from a single image or video. It produces accurate HDR environment maps, addressing a long-standing challenge in computer vision. 🔗Paper: arxiv.org/abs/2509.03680

English

271

20.1K

Amlan Kar retweetledi

Yue Wang@yuewang314·15 Ağu

🚀 Join Us: Research Internships in Embodied Intelligence The USC Geometry, Vision, and Learning Lab (usc-gvl.github.io) is seeking highly motivated interns to push the frontiers of AI, robotics, and 3D computer vision. You’ll work on large-scale VLA models, hardware–software co-design for robotic data collection, humanoids, and cutting-edge 3D computer vision research. 🔍 Research Areas Robot Learning — Large-scale algorithm training for embodied agents Hardware–Software Co-Design — Building next-gen robotic sensing and actuation platforms 3D Reconstruction & Perception — Neural scene representations, SLAM, and generative 3D modeling Deep Learning at Scale — Vision-language-action model development and optimization 🛠 Desired Expertise We welcome candidates with experience in one or more of the following: Robot learning algorithm development & training pipelines Hardware design for robotic platforms and sensor integration 3D reconstruction, NeRFs, and geometric deep learning Large-scale deep learning (PyTorch, JAX, distributed training) Computer vision & multimodal learning (images, videos, language, actions) 🌟 What You’ll Do Design and train large VLAs for robotic decision-making Develop novel hardware–software systems for efficient, high-quality robotic data collection Implement and benchmark state-of-the-art 3D perception and reconstruction algorithms Collaborate with a multidisciplinary team spanning AI, robotics, and computer vision 📍Commitment Duration: >= 3 month Weekly commitment: >= 20 hours, ideally 40 hours Start Date: 09/2025 📩 How to Apply USC Students: Apply here forms.gle/YCqXRF3wnksNCw… Non-USC Applicants: Apply here forms.gle/wLmPS3bZNGPtSX…

English

200

23.4K

Amlan Kar retweetledi

Jiahui Huang@huangjh_hjh·12 Ağu

[1/N] 🎥 We've made available a powerful spatial AI tool named ViPE: Video Pose Engine, to recover camera motion, intrinsics, and dense metric depth from casual videos! Running at 3–5 FPS, ViPE handles cinematic shots, dashcams, and even 360° panoramas. 🔗 research.nvidia.com/labs/toronto-a…

English

100

450

61.9K

Amlan Kar retweetledi

Angjoo Kanazawa@akanazawa·12 Ağu

Viser completely changed the way we do research. Before viser, it was hard to visualize 3D/4D data, let alone share it. Now it’s all just in a browser! It’s amazingly powerful and looks awesome. It’s how we render our results and videos. We love it and hope you will too!

Brent Yi@brenthyi

July has been a big month for Viser! - Released v1.0.0😊 - We did some writing Some demos👇

English

345

23.5K

Keşfet

@gkopanas @TechnionLive @NVIDIAAI @assaf_singer @NoamRot @mann_amir_ @RonnyKimmel @tylercowen