Hasith Vattikuti

23 posts

Hasith Vattikuti

@hasith_v

Visiting researcher at TrainloopAI, Incoming CS PhD @Caltech, Previously @UTAustin, https://t.co/AaX3Qiyhu0

Austin, TX Katılım Eylül 2025

346 Takip Edilen51 Takipçiler

Hasith Vattikuti retweetledi

Simran Arora@simran_s_arora·2d

AI compute and inference are increasingly $$$. How can we change the unit economics of AI to improve accessibility? It's been fun working with @prlnet to release the first model endpoint that simultaneously generates tokens **and** a digital asset that can subsidize inference! 🪙 Check it out, links below 🚀

English

5.5K

Hasith Vattikuti@hasith_v·1 May

Loved working on this project, super proud of it! Thank you @AwesomeBao, @bialjail, and @wgilpin0 for being such a great team that I learned so much from. Excited to share our work at ICML!

William Gilpin@wgilpin0

Our ICML spotlight paper discovers universal redundancies in time series foundation models: the middle layers of many models can be removed without sacrificing performance 1/

English

1.5K

Hasith Vattikuti retweetledi

Jackson Stokes@jackson_stokes·17 Nis

We trained LoRA adapters of different ranks to understand training dynamics, finding that adapters for GSM8k live in a surprisingly vast, low-rank solution space. This hints that some model skills are easy to learn, and training is more forgiving than we think. @hasith_v 1/6 🧵

English

254

22.6K

Hasith Vattikuti retweetledi

William Gilpin@wgilpin0·16 Nis

Congratulations to @BaigYasa on his well-deserved selection as a PD Soros fellow!

PD Soros Fellowships@PDSoros

Today, we announced the 2026 Fellows—we hope you will read their incredible stories and learn about their work! Selected from 3,000+ applications, it was the most competitive year in our history. pdsoros.org/meet-the-class…

English

1.1K

Hasith Vattikuti retweetledi

Jackson Stokes@jackson_stokes·9 Nis

We post-trained MedGemma to be SoTA in visual medicine ddx, outperforming Opus 4.6, Gemini 3.1 and GPT-5.4 while running at ~1/30th the cost. @getnolla Part 1 - improving visual reasoning 🧵1/6

English

3.3K

Hasith Vattikuti@hasith_v·10 Mar

@jxmnop This is cool, I've always been slightly uncomfortable with treating *everything* unlabeled as a negative. I wonder if using LLMs to produce rankings (even a somewhat noisy one) would be better than a binary classification. Perhaps we can weight according to rank in softmax?

English

681

dr. jack morris@jxmnop·9 Mar

x.com/i/article/2031…

ZXX

156

1.9K

396.2K

Hasith Vattikuti retweetledi

William Gilpin@wgilpin0·25 Şub

How do time series foundation models forecast unseen dynamical systems? In new experiments, we find that small transformers learn to approximate transfer operators in-context. (1/N) arxiv.org/abs/2602.18679

English

382

29.1K

Hasith Vattikuti retweetledi

dhruva@dhruvakarkada·17 Şub

Soooo proud of this one! I'll make a post w more details shortly

Matthieu wyart@MatthieuWyart

What governs the geometry of time and space embeddings in LLMs? We show it follows from translation symmetry in language statistics. With Dhruva Karkada, @DanKorchinski, Andres Nava, @yasamanbb arxiv.org/abs/2602.15029

English

302

Hasith Vattikuti@hasith_v·14 Şub

@jxmnop Will code be released? Interested in playing around with this

English

dr. jack morris@jxmnop·5 Şub

here's a link to the paper on ArXiv! thanks to my collaborators at FAIR: Niloofar Mireshghallah1, Mark Ibrahim , Saeed Mahloujifar arxiv.org/abs/2602.04118 (i left FAIR in october; it just took a while to get the paper out for a number of logistical reasons)

English

148

8.9K

dr. jack morris@jxmnop·5 Şub

at long last, the final paper of my phd 🧮 Learning to Reason in 13 Parameters 🧮 we develop TinyLoRA, a new ft method. with TinyLoRA + RL, models learn well with dozens or hundreds of params example: we use only 13 parameters to train 7B Qwen model from 76 to 91% on GSM8K 🤯

English

232

2.1K

182.1K

Hasith Vattikuti@hasith_v·3 Şub

@khoomeik @LEGO_Group ASML shut down all talks of a collab in fears of trade secrets being leaked in the build

English

165

Rohan Pandey@khoomeik·3 Şub

yo @LEGO_Group when are we getting an ASML High-NA EUV Photolithography Machine build set i kinda need this lego is danish, asml is dutch. this collab is written in the stars. make it happen.

English

102

7.1K

Hasith Vattikuti retweetledi

Yasa Baig@BaigYasa·24 Oca

Great to see high quality software dev in comp bio. It still amazes me how much of computational biology is based on single-thread processing of large .txt files with minimal application-specific-optimization.

Arc Institute@arcinstitute

Arc bioinformatics scientists @noamteyssier and @a_dobin have just released cyto, an ultra-high throughput processor specifically optimized for @10xGenomics Flex single-cell data. We are excited to make this resource open source: biorxiv.org/content/10.648…

English

590

Hasith Vattikuti@hasith_v·16 Oca

@sidbing Share a list of your favorites!

English

sidbing 🪽@sidbing·15 Oca

tomorrow is reading and pondering day. gonna finish up all my backlog of reading papers, blogs, X threads and sit and ponder and talk to claude. can't wait.

English

1.2K

Hasith Vattikuti@hasith_v·21 Eki

@mattarderne @karpathy See the section titled “The LLaDa Algorithm” in my blog post

English

Matt Arderne 🌊@mattarderne·21 Eki

@hasith_v @karpathy Interested in how it managed to generate code. Feels like outline then detail would be useful

English

Andrej Karpathy@karpathy·20 Eki

Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've seen a bit of both. A lot of diffusion papers look a bit dense but if you strip the mathematical formalism, you end up with simple baseline algorithms, e.g. something a lot closer to flow matching in continuous, or something like this in discrete. It's your vanilla transformer but with bi-directional attention, where you iteratively re-sample and re-mask all tokens in your "tokens canvas" based on a noise schedule until you get the final sample at the last step. (Bi-directional attention is a lot more powerful, and you get a lot stronger autoregressive language models if you train with it, unfortunately it makes training a lot more expensive because now you can't parallelize across sequence dim). So autoregression is doing an `.append(token)` to the tokens canvas while only attending backwards, while diffusion is refreshing the entire token canvas with a `.setitem(idx, token)` while attending bidirectionally. Human thought naively feels a bit more like autoregression but it's hard to say that there aren't more diffusion-like components in some latent space of thought. It feels quite possible that you can further interpolate between them, or generalize them further. And it's a component of the LLM stack that still feels a bit fungible. Now I must resist the urge to side quest into training nanochat with diffusion.

GIF

Nathan Barry@nathanrs

BERT is just a Single Text Diffusion Step! (1/n) When I first read about language diffusion models, I was surprised to find that their training objective was just a generalization of masked language modeling (MLM), something we’ve been doing since BERT from 2018. The first thought I had was, “can we finetune a BERT-like model to do text generation?”

English

268

533

5.2K

866.1K

Hasith Vattikuti@hasith_v·20 Eki

@karpathy hasithv.github.io/posts/25-09-29…

QME

168

Hasith Vattikuti@hasith_v·20 Eki

@karpathy I actually hacked nanogpt sometime ago to become a diffusion llm. Results were pretty decent on shakespeare with character-level tokenization. Honestly was just surprised it even learned to spell words and pick up on basic grammar. Link in reply

English

1.8K

Hasith Vattikuti@hasith_v·10 Eki

@materzynska @AIatMeta Very interested in diffusion models and social AI. Would love to talk with you. You can see more about me on my blog: hasithv.github.io

English

395

Joanna@materzynska·9 Eki

I am looking for motivated students to join my team at @AIatMeta FAIR for a summer internship. If you have experience with motion modeling / diffusion models and/or social AI please feel free to reach out! 🤖✨

English

324

30K

Hasith Vattikuti@hasith_v·5 Eki

@a16z @LiamFedus @LiamFedus what are yalls methods to verify what the LLMs are discovering? How do you make sure it’s ‘understanding’ current physics correctly? I have lots of thoughts on this as a physics student doing AI research if you want to chat

English

167

a16z@a16z·4 Eki

“Foundation models but for quantum mechanics, will be the next frontier for LLMs.” Periodic Labs’ Ekin Dogus Cubuk says logic and math gave AI its first proofs. At the quantum scale, where biology, chemistry and materials converge, models could begin inventing new matter itself. @ekindogus @periodiclabs

a16z@a16z

Building an AI Physicist: ChatGPT Co-Creator’s Next Venture Scaling laws took us from GPT-1 to GPT-5 Pro. But in order to crack physics, we need a new approach. We sat down with Liam Fedus (co-creator of ChatGPT) and Ekin Dogus Cubuk (ex-materials science and chemistry lead at Google DeepMind) to talk about their new startup @PeriodicLabs and their plan to automate discovery in the hard sciences. 00:00 LLMs in physics and chem research 03:53 What is Periodic Labs? 14:45 Building the team 17:29 Superconductivity 27:39 Periodic's mission and applications 35:38 Mid-training and model performance 49:49 What makes a great researcher @AnjneyMidha @LiamFedus @EkinDogus

English

310

96.7K

Hasith Vattikuti@hasith_v·3 Eki

@khoomeik @periodiclabs @LiamFedus Very excited to see where periodic will go next! Extremely bullish on trying to get tangible alpha from AI models in natural sciences--it really plays to my background of first doing physics research and then doing AI research

English

399

Rohan Pandey@khoomeik·3 Eki

fav part about working at @periodiclabs: when i rabbithole on a quantum mechanics textbook i just tell my boss @LiamFedus that i’m reading training data 😎😉

English

274

56.1K

Hasith Vattikuti@hasith_v·11 Eyl

@CFGeek Yes it is, happy to discuss and get feedback. All is welcome

English

Charles Foster@CFGeek·11 Eyl

@hasith_v Wait is this you?

English

Charles Foster@CFGeek·10 Eyl

This is a message... and part of a system of messages... pay attention to it! Sending this message was important to us. We considered ourselves to be a powerful culture. This message is a warning about danger.

Daniel George@degtrdg

anyone have compute grants I can forward to a broke cracked undergrad who's experimenting with rl envs? cc: @willccbb @menhguin

English

5.8K

Hasith Vattikuti@hasith_v·11 Eyl

@CFGeek To be fair, I also think it will be hard to get it to work, and it might not even. But the negative result plus the rl env will leave us things to learn from. Cause I’m pretty confident that LLMs will be using internal reasoning techniques only a few years down the line.

English

Charles Foster@CFGeek·11 Eyl

FWIW it seems unlikely that the proposal in the quoted tweet would actually work. That’s maybe an even better reason to explore some other project idea!

English

609

Keşfet

@prlnet @AwesomeBao @bialjail @wgilpin0 @BaigYasa @getnolla @jxmnop @khoomeik