rohan anil

9.6K posts

rohan anil banner
rohan anil

rohan anil

@_arohan_

aspiring to understand deep learning

Katılım Aralık 2017
2.2K Takip Edilen38K Takipçiler
rohan anil retweetledi
Sundar Pichai
Sundar Pichai@sundarpichai·
Google is now the first cloud provider to integrate 1 GW of flexible demand into long-term utility contracts. Our ability to shift or reduce our energy demand when it’s needed can help utility companies balance supply/ demand and plan for future capacity needs. This is a big milestone for responsible data center growth and helps keep costs lower for local communities. blog.google/innovation-and…
English
81
105
869
41.4K
rohan anil
rohan anil@_arohan_·
@Miles_Brundage Is the problem you are seeing model overthinks — Have you tried asking it to give you intermediate steps?
English
1
0
0
114
Miles Brundage
Miles Brundage@Miles_Brundage·
Did the heat wave do this to u, Claudey
Miles Brundage tweet media
English
1
0
7
802
Miles Brundage
Miles Brundage@Miles_Brundage·
Pls get Claude Code working reliably before shipping new features, pls pls pls
English
5
3
55
5K
rohan anil
rohan anil@_arohan_·
Soon self driving car everywhere, I know.
English
0
0
4
847
rohan anil
rohan anil@_arohan_·
Writing code character by character has become equivalent to riding horses. You can instead learn to drive a car to get places. And occasionally do what people in middle ages did when claude is down.
English
4
1
23
2.3K
Jerry Tworek
Jerry Tworek@MillionInt·
AI labs need a wallfacer project. AI researcher not having to explain themselves to anyone. performing seemingly random actions with hidden inscrutable agenda to create a SOTA model in a way no one would deem possible
English
26
11
372
43K
rohan anil retweetledi
Polymarket
Polymarket@Polymarket·
JUST IN: Nvidia CEO Jensen Huang calls on tech leaders to "be careful not to scare people" regarding AI.
English
272
123
2.1K
143.7K
rohan anil retweetledi
Ravid Shwartz Ziv
Ravid Shwartz Ziv@ziv_ravid·
New episode of the Information Bottleneck! We talked with @StefanoErmon about why he thinks diffusion LLMs will replace autoregressive ones. Stefano co-invented DDIM, FlashAttention, DPO, and score-based diffusion models. He's a Stanford professor and now runs @Inception_AI, where they built Mercury II. We go deep but also cover the bigger picture - the startup journey, PhD vs industry, and where AI is heading. A few things that stuck with me: - He thinks of autoregressive models as typewriters and diffusion models as editors. One goes left to right. The other starts messy and refines. - Mercury II (their text difussion model) already beats the fastest autoregressive models on latency-critical stuff as voice agents, code suggestions, anything where you have a tight time budget. And it does it because diffusion generates tokens in parallel instead of one at a time. - We also got into whether AI will actually replace software engineers (his answer: no), PhD vs industry advice, and what it was like going from an ICML best paper to raising money.
English
8
20
194
17.5K
rohan anil retweetledi
Amanda Askell
Amanda Askell@AmandaAskell·
Perhaps I should get married again so that the media has a more recent man they can reference any time they mention me or my work.
English
265
75
2.9K
320.7K
rohan anil retweetledi
Ksenia_TuringPost
Ksenia_TuringPost@TheTuringPost·
At this nerdiest of all nerdy sessions 💞, Jeff Dean said he doesn’t think we’re running out of data. “I think there’s still an enormous amount of data in the world that we haven’t really used yet for training these models. We train on some video data, for example, but there’s a lot more video out there, along with associated audio, that we’re not necessarily making full use of yet. I also think real-world robotics data, and autonomous vehicle data, is going to be fairly plentiful. And then synthetic data is another resource. If you can generate really interesting, high-quality data, then you can effectively inject more compute and get more training data that way. Now, of course, there’s a reasonable question here: aren’t you eventually just regurgitating the same stuff? If you train on data, then use that model to generate synthetic data, are you just making another version of what you already had? Maybe to some extent. But I still think it can help, especially if the model generating the synthetic data is itself very powerful. At least so far, that does seem to be useful. And beyond that, there are also a lot of techniques we’re not really using much right now that used to be very common in other domains, like convolutional image models years ago. Things like data augmentation are interesting. That’s one way to think about synthetic data. Techniques to prevent overfitting are also interesting. You can use dropout, distillation, and other forms of regularization. So I think there’s still a lot of opportunity to make models better with more compute and more passes over the data, without necessarily running into overfitting.“ A fascinating conversation between @JeffDean and @BillDally @NVIDIAGTC
Ksenia_TuringPost tweet media
English
5
23
153
24.9K
rohan anil
rohan anil@_arohan_·
@yifan_zhang_ I liked it a lot - a cool title for this gem would be The Deep Remembers!
English
0
0
11
2.5K
Yifan Zhang
Yifan Zhang@yifan_zhang_·
Oh, Rohan the Legend, liked it! Definitely worth reading!
Yifan Zhang tweet media
Yifan Zhang@yifan_zhang_

Some simple notes on the Residual Stream Duality in Modern Transformer Architectures. Hope you will enjoy it! Project Page: github.com/yifanzhang-pro… Takeway: Math-wise, depth attention is dual to sequence ShortSWA. System-wise, it is not. Sequence-axis ShortSWA fits today's kernels and caches, while depth-axis attention adds cross-layer state and extra systems overhead. TLDR: Use either sequence ShortSWA or Deep Delta Learning! (github.com/yifanzhang-pro…)

English
1
1
38
5.3K
rohan anil
rohan anil@_arohan_·
Time is life’s most valuable resource.
English
2
3
57
3.2K
rohan anil
rohan anil@_arohan_·
This is funny! Although in hindsight I think we should give due credit to all the new works that improve on it and scale, think of about deploying in a real training run (solving for memory growth) An advantage Google had was that there was extremely strong folks left alone to think for a longer time thus able to ascend in creative directions like this.
English
1
0
42
3.8K
Simo Ryu
Simo Ryu@cloneofsimo·
lmao > google cooks paper, "meh its probably not gonna work, pass" > chinese lab cooks exact same thing one year later, everyone gets super hyped EVERY SINGLE TIME
Simo Ryu tweet media
Ali Behrouz@behrouz_ali

This paper is the same as the DeepCrossAttention (DCA) method from more than a year ago: arxiv.org/abs/2502.06785. As far as I understood, here there is no innovation to be excited about, and yet surprisingly there is no citation and discussion about DCA! The level of redundancy in LLM research and then the hype on X is getting worse and worse! DeepCrossAttention is built based on the intuition that depth-wise cross-attention allows for richer interactions between layers at different depths. DCA further provides both empirical and theoretical results to support this approach.

English
21
27
501
52.1K
rohan anil
rohan anil@_arohan_·
Actually these ideas predate a lot earlier as well. But let’s celebrate all of the papers and the current results too 🤝 Memory 📈
Ali Behrouz@behrouz_ali

This paper is the same as the DeepCrossAttention (DCA) method from more than a year ago: arxiv.org/abs/2502.06785. As far as I understood, here there is no innovation to be excited about, and yet surprisingly there is no citation and discussion about DCA! The level of redundancy in LLM research and then the hype on X is getting worse and worse! DeepCrossAttention is built based on the intuition that depth-wise cross-attention allows for richer interactions between layers at different depths. DCA further provides both empirical and theoretical results to support this approach.

English
3
5
75
11K
rohan anil retweetledi
The Nobel Prize
The Nobel Prize@NobelPrize·
“Timing is very important. You need to pick hard problems to solve and be ambitious with them. But you've also got to pick the right time when the world and the context that you're in is the right kind of environment for those ideas to flourish.” In his official Nobel Prize interview, Demis Hassabis discussed how his aspirations as a young gaming programmer were ahead of their time. Watch our official interview: bit.ly/41DGkXr
The Nobel Prize tweet media
English
85
460
3.5K
269.2K
rohan anil retweetledi
Patrick Collison
Patrick Collison@patrickc·
• According to the story, the dog's cancer has not been cured. • Absent all regulatory and manufacturing constraints, we could not just synthesize magic mRNA cancer cures. The technology is very promising, but it's not yet any kind of panacea. • The emergent system of regulators and manufacturers is indeed far too conservative, and small-scale experimentation is much harder than it should be. More people should read the first part of The Rise and Fall of Modern Medicine. Recommend @RuxandraTeslo, @PatrickHeizer for more.
English
153
297
4.3K
843.7K
MumbaiPanda
MumbaiPanda@DarwinianVyas·
@_arohan_ If trends hold, Sonnet 4.5 is estimated to have near 100% retrieval at 1B context
English
1
0
3
94