Prafull Sharma

357 posts

Prafull Sharma

@prafull7

PostDoc @MIT with Josh Tenenbaum and Phillip Isola PhD @MIT with Bill Freeman and Fredo Durand Undergrad @Stanford

Cambridge, MA Katılım Eylül 2010

808 Takip Edilen1.4K Takipçiler

Sabitlenmiş Tweet

Prafull Sharma@prafull7·6 Haz

Graduated with a PhD in Computer Science @MIT! Grateful to my advisors and teachers who helped me learn and grow in this journey! Thanks to all my friends and family members for their support.

English

116

97.3K

Prafull Sharma retweetledi

Shivam Duggal@ShivamDuggal4·3d

Tokenization & Generation power Large Models. But are they really separate? Tokenization=Generation under strong observability UNITE: An end-to-end training framework where one shared Generative Encoder (GE) performs both token. & latent denoising Paper: arxiv.org/abs/2603.22283

English

396

58.4K

Prafull Sharma retweetledi

Phillip Isola@phillip_isola·13 Mar

Sharing “Neural Thickets”. We find: In large models, the neighborhood around pretrained weights can become dense with task-improving solutions. In this regime, post-training can be easy; even random guessing works Paper: arxiv.org/abs/2603.12228 Web: thickets.mit.edu 1/

English

125

918

136.2K

Prafull Sharma retweetledi

Yulu Gan@yule_gan·13 Mar

Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt. To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs. What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets. Paper: arxiv.org/pdf/2603.12228 Code: github.com/sunrainyg/Rand… Website: thickets.mit.edu

English

432

675K

Prafull Sharma retweetledi

Lance Ying@LanceYing42·23 Şub

Today we present a new framework for measuring human-like general intelligence in machines (what some people call AGI). Conventional AI benchmarks today assess only narrow capabilities in a limited range of human activities. We propose that a more promising way to evaluate human-like general intelligence in AI systems is through a particularly strong form of general game playing: studying how and how well they play and learn to play all conceivable human games — what we call the ``Multiverse of Human Games''. Taking a first step towards this vision, we introduce the AI GameStore, a scalable and open-ended platform that uses LLMs with humans-in-the-loop to automatically construct standardized and containerized variants of popular human games on digital gaming platforms. As a proof of concept, we generated 100 such games based on the top charts of Apple App Store and Steam, and evaluated seven frontier vision-language models (VLMs) on short episodes of play. The best models achieved less than 10% of the human average score on the majority of the games. Check out our website to play the games, see how agents play, and build agents to solve them!

English

112

18.8K

Prafull Sharma retweetledi

Akarsh Kumar@akarshkumar0101·8 Oca

Check out our new Digital Red Queen work! Core War is a programming game where assembly programs fight against each other for control of a Turing-complete virtual machine. We ask what happens when an LLM drives an evolutionary arms race in this domain. We find that as you run our DRQ algorithm for longer, the resulting programs become more generally robust, while also showing evidence of convergence across independent runs - a sign of convergent evolution!

Sakana AI@SakanaAILabs

Introducing Digital Red Queen (DRQ): Adversarial Program Evolution in Core War with LLMs Blog: sakana.ai/drq Core War is a programming game where self-replicating assembly programs, called warriors, compete for control of a virtual machine. In this dynamic environment, where there is no distinction between code and data, warriors must crash opponents while defending themselves to survive. In this work, we explore how LLMs can drive open-ended adversarial evolution of these programs within Core War. Our approach is inspired by the Red Queen Hypothesis from evolutionary biology: the principle that species must continually adapt and evolve simply to survive against ever-changing competitors. We found that running our DRQ algorithm for longer durations produces warriors that become more generally robust. Most notably, we observed an emergent pressure towards convergent evolution. Independent runs, starting from completely different initial conditions, evolved toward similar general-purpose behaviors—mirroring how distinct species in nature often evolve similar traits to solve the same problems. Simulating these adversarial dynamics in an isolated sandbox offers a glimpse into the future, where deployed LLM systems might eventually compete against one another for computational or physical resources in the real world. This project is a collaboration between MIT and Sakana AI led by @akarshkumar0101 Full Paper (Website): pub.sakana.ai/drq/ Full Paper (arxiv): arxiv.org/abs/2601.03335 Code: github.com/SakanaAI/drq/

English

103

21.9K

Prafull Sharma retweetledi

Phillip Isola@phillip_isola·11 Eki

Over the past year, my lab has been working on fleshing out theory/applications of the Platonic Representation Hypothesis. Today I want to share two new works on this topic: Eliciting higher alignment: arxiv.org/abs/2510.02425 Unpaired rep learning: arxiv.org/abs/2510.08492 1/9

English

119

695

67.1K

Prafull Sharma retweetledi

Lance Ying@LanceYing42·21 Tem

A hallmark of human intelligence is the capacity for rapid adaptation, solving new problems quickly under novel and unfamiliar conditions. How can we build machines to do so? In our new preprint, we propose that any general intelligence system must have an adaptive world model, i.e. they must be able to rapidly construct or refine their internal representation through interaction and exploration — a process we call “world model induction”. We propose a roadmap for evaluating adaptive world models in machines based on a special class of games we call “novel games”.

English

101

511

69K

Prafull Sharma retweetledi

Phillip Isola@phillip_isola·14 Haz

Our computer vision textbook is now available for free online here: visionbook.mit.edu We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!

English

605

2.9K

181.6K

Prafull Sharma retweetledi

Ta-Ying Cheng@ChengTim0708·6 Haz

Imagine a Van Gogh-style teapot turning into glass with one simple slider🎨 Introducing MARBLE, material edits by simply changing CLIP embedding! 🔗 marblecontrol.github.io 👏 Internship project with @prafull7, @markb_boss , @jampani_varun at @StabilityAI

GIF

English

2.5K

Prafull Sharma retweetledi

Hyojin Bahng@hyojinbahng·5 Haz

Image-text alignment is hard — especially as multimodal data gets more detailed. Most methods rely on human labels or proprietary feedback (e.g., GPT-4V). We introduce: 1. CycleReward: a new alignment metric focused on detailed captions, trained without human supervision. 2. CyclePrefDB: 866K preference pairs from cycle consistency. 📄 Paper arxiv.org/abs/2506.02095 🌐 Project Page cyclereward.github.io 💻 Code github.com/hjbahng/cycler… (1/🧵)

English

197

37.7K

Prafull Sharma retweetledi

Akarsh Kumar@akarshkumar0101·20 May

Excited to share our position paper on the Fractured Entangled Representation (FER) Hypothesis! We hypothesize that the standard paradigm of training networks today — while producing impressive benchmark results — is still failing to create a well-organized internal representation of its output behavior. Instead, the internal representation seems more like “spaghetti” in how different concepts are fractured and entangled together, rather than employing a more holistic representational strategy that properly captures the regularities in the data. FER may be one of the reasons for the idiosyncratic failure modes of modern foundation models in OOD generalization, creativity, and continual learning. Check out Ken’s tweet for more information! (Then keep reading below)

Kenneth Stanley@kenneth0stanley

Could a major opportunity to improve representation in deep learning be hiding in plain sight? Check out our new position paper: Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis. The idea stems from a little-known observation about networks trained to output a single image: when they are discovered through an unconventional open-ended search process, their representations are incredibly elegant and exhibit astonishing modular decomposition. In contrast, when SGD (successfully) learns to output the same image its underlying representation is fractured, entangled - an absolute mess! This stark difference in the underlying representation of the same "good" output behavior carries deep lessons for deep learning. It shows you cannot judge a book by its cover - an LLM with all the right responses could similarly be a mess under the hood. But also, surprisingly, it shows us that it doesn't have to be this way! Without the unique examples in this paper that were discovered through open-ended search, we might assume neural representation has to be a mess. These results show that is clearly untrue. We can now imagine something better because we can actually see it is possible. We give several reasons why this matters: generalization, creativity, and learning are all potentially impacted. The paper shows examples to back up these concerns, but in brief, there is a key insight: Representation is not only important for what you're able to do now, but for where you can go from there. The ability to imagine something new (and where your next step in weight space can bring you) depends entirely upon how you represent the world. Generalization, creativity, and learning itself depend upon this critical relationship. Notice the difference in appearance between the nearby images to the skull in weight space shown in the top-left and top-right image strips of the attached graphic. The difference in semantics is stark. The insight that representation could be better opens up a lot of new paths and opportunities for investigation. It raises new urgency to understand the representation underlying foundation models and LLMs while exposing all kinds of novel avenues for potentially improving them, from making learning processes more open-ended to manipulating architectures and algorithms. Don't mistake this paper as providing comfort for AI pessimists. By exposing a novel set of stark and explicit differences between conventional learning and something different, it can act as an accelerator of progress as opposed to a tool of pessimism. At the least, the discussion it provokes should be quite illuminating.

English

251

35.7K

Prafull Sharma retweetledi

Shaden@Sa_9810·24 Nis

Excited to share our ICLR 2025 paper, I-Con, a unifying framework that ties together 23 methods across representation learning, from self-supervised learning to dimensionality reduction and clustering. Website: aka.ms/i-con A thread 🧵 1/n

English

12.2K

Prafull Sharma retweetledi

Jeremy Bernstein@jxbz·7 Mar

I just wrote my first blog post in four years! It is called "Deriving Muon". It covers the theory that led to Muon and how, for me, Muon is a meaningful example of theory leading practice in deep learning (1/11)

English

137

123.4K

Prafull Sharma retweetledi

Vincent Sitzmann@vincesitzmann·11 Şub

We wrote a new video diffusion paper! @kiwhansong0 and @BoyuanChen0 and co-authors did absolutely amazing work here. Apart from really working, the method of "variable-length history guidance" is really cool and based on some deep truths about sequence generative modeling....

Boyuan Chen@BoyuanChen0

Announcing Diffusion Forcing Transformer (DFoT), our new video diffusion algorithm that generates ultra-long videos of 800+ frames. DFoT enables History Guidance, a simple add-on to any existing video diffusion models for a quality boost. Website: boyuan.space/history-guidan… (1/7)

English

122

12.9K

Prafull Sharma retweetledi

Andrej Karpathy@karpathy·3 Şub

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

English

1.4K

3.6K

33.5K

6.9M

Prafull Sharma retweetledi

Phillip Isola@phillip_isola·24 Ara

As a kid I was fascinated the Search for Extraterrestrial Intelligence, SETI Now we live in an era when it's becoming meaningful to search for "extraterrestrial life" not just in our universe but in simulated universes as well This project provides new tools toward that dream:

Sakana AI@SakanaAILabs

Introducing ASAL: Automating the Search for Artificial Life with Foundation Models sakana.ai/asal/ Artificial Life (ALife) research holds key insights that can transform and accelerate progress in AI. By speeding up ALife discovery with AI, we accelerate our understanding of emergence, evolution, and intelligence–core principles that can inspire the next generation of AI systems! We proudly collaborated with MIT, OpenAI, Swiss AI Lab IDSIA, and Ken Stanley on this exciting project. Full Paper (Website): pub.sakana.ai/asal/ Full Paper (arxiv): asal.sakana.ai/paper/ Code: github.com/SakanaAI/asal/ In this work, we propose a new algorithm called Automated Search for Artificial Life (“ASAL”) to automate the discovery of artificial life using vision-language foundation models. Instead of tediously hand-designing every tiny rule of an Alife simulation, simply describe the space of simulations to search over, and ASAL will automatically discover the most interesting and open-ended artificial lifeforms! Because of the generality of foundation models, ASAL can discover new lifeforms across a diverse range of seminal ALife simulations, including Boids, Particle Life, Game of Life, Lenia, and Neural Cellular Automata. ASAL even discovered novel cellular automata rules that are more open-ended and expressive than the original Conway’s Game of Life. We believe this new paradigm may reignite ALife research by overcoming the bottleneck of manually designed simulations, thus advancing beyond the limits of human ingenuity.

English

212

28.8K

Prafull Sharma retweetledi

Shivam Duggal@ShivamDuggal4·7 Kas

Current vision systems use fixed-length representations for all images. In contrast, human intelligence or LLMs (eg: OpenAI o1) adjust compute budgets based on the input. Since different images demand diff. processing & memory, how can we enable vision systems to be adaptive ? 🧵

English

476

92.7K

Prafull Sharma retweetledi

Vin Agarwal@vin_agarwal·23 Eki

Had a lot of fun working on this. Stay tuned for more research on how human listeners reverse engineer the physics of the world using the sounds they hear

Josh McDermott@JoshHMcDermott

We just wrote a primer on how the physics of sound constrains auditory perception: authors.elsevier.com/a/1jzSR3QW8S6E… Covers sound propagation and object interactions, and touches on their relevance to music and film. I enjoyed working on this with @vin_agarwal and James Traer.

English

2.4K

Prafull Sharma retweetledi

Josh McDermott@JoshHMcDermott·23 Eki

English

123

13.3K

Prafull Sharma retweetledi

Shobhita Sundaram@shobsund·15 Eki

What happens when models see the world as humans do? In our #NeurIPS2024 paper we show that aligning to human perceptual preferences can *improve* general-purpose representations! 📝: arxiv.org/abs/2410.10817 🌐: percep-align.github.io 💻: github.com/ssundaram21/dr… (1/n)

English

452

52.5K

Keşfet

@markb_boss @jampani_varun @StabilityAI @kiwhansong0 @BoyuanChen0 @vin_agarwal @elonmusk @BarackObama