Ishaan Gulrajani
1K posts

Ishaan Gulrajani
@__ishaan
Hi! I’m a machine learning researcher @openai. Previously @stanford @facebook @google @mila_quebec

Never forget @karpathy training a recurrent neural net (precursor to transformers) to imitate @paulg in 2015—a thing of syntactic and semantic beauty:




NEW! Part two of a #KempnerInstitute blog series: @blake__bordelon, @ABAtanasov & @CPehlevan propose a simple and solvable model where many of the aspects of #LLMs are already present. Read more: bit.ly/3RqYMhX #neuralnetworks #AI

What does it mean for an image, video, or text to be 𝑟𝑒𝑎𝑙𝑖𝑠𝑡𝑖𝑐? Despite how far we've come in 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑖𝑛𝑔 realistic data, 𝑞𝑢𝑎𝑛𝑡𝑖𝑓𝑦𝑖𝑛𝑔 realism is still a poorly understood problem. I've shared my thoughts on how to correctly quantify realism here: arxiv.org/abs/2403.04493 #icml2024 #genai #compression



Introducing Pika 1.0, the idea-to-video platform that brings your creativity to life. Create and edit your videos with AI. Rolling out to new users on web and discord, starting today. Sign up at pika.art


Today 6 years ago, "Attention is All You Need" went on Arxiv! Happy birthday Transformer! 🎂 Fun facts: - Transformer did not invent attention, but pushed it to the extreme. The first attention paper was published 3 years prior (2014) and had an unassuming title: "Neural Machine Translation by Jointly Learning to Align and Translate", from Yoshua Bengio's lab. It is a combination of RNN + "context vectors" (i.e. attention). Many of you likely haven't heard about this paper, but it's one of the greatest milestones in NLP and has been cited 29K times (compared to Transformer's 77K). - Neither Transformer nor the original attention paper talked about the general-purpose sequence computer. Instead, both were conceived as solutions to one narrow & specific problem: machine translation. It's remarkable that AGI (some day soon) can trace its origin to the humble Google Translate. 😅 - Transformer was published at NeurIPS 2017, one of the top AI conferences worldwide. Yet it didn't even get an Oral presentation, let alone awards. There were 3 best papers at NeurIPS that year. Combined, they have 529 citations as of today.



New paper with @tatsu_hashimoto! Likelihood-Based Diffusion Language Models: arxiv.org/abs/2305.18619 Likelihood-based training is a key ingredient of current LLMs. Despite this, diffusion LMs haven't shown any nontrivial likelihoods on standard LM benchmarks. We fix this!🧵












