Anshuman Suri

845 posts

Anshuman Suri banner
Anshuman Suri

Anshuman Suri

@iamgroot42

Research @datologyai | Previously Postdoc @KhouryCollege, Ph.D. @UVA | Interested in data quality x security & privacy.

San Francisco, CA Katılım Şubat 2016
855 Takip Edilen681 Takipçiler
Anshuman Suri retweetledi
Pratyush Maini
Pratyush Maini@pratyushmaini·
If I had to compress my PhD into one idea, it is this "The data a model sees early in training leaves an imprint on its representations that is very hard to undo later" This thread runs through - Rephrasing the Web - Safety Pretraining - TOFU This is the Finetuner’s Fallacy🧵
English
21
55
729
54.8K
Anshuman Suri
Anshuman Suri@iamgroot42·
@Utd_Y14 My guess is that they decouple Paul's own role from 'Children of Dune' and somehow merge it with hers, and also looking at Paul's teeth in one of the posters, merge the Worm's role into what Paul will be in this one (given they're pitching it as a "conclusion")?
English
0
0
2
6K
Y?
Y?@Utd_Y14·
compelled to see how they handle Chani given this is Messiah they’re adapting and the whole relationship between Paul and Chani is given based on Chani being a plank of wood and is fine with being Paul’s concubine in contrast to the films where she’s got a sense of urgency and actively holds paul accountable plus she dips at the end of part two.
Film Updates@FilmUpdates

Zendaya in the new poster for ‘DUNE: PART THREE’ In theaters December 18.

English
47
211
8.6K
984.9K
Anshuman Suri retweetledi
Ari Morcos
Ari Morcos@arimorcos·
Love seeing the timeline wake up to the fact that data is the most underinvested area in ML. But let’s set the record straight: the world’s premier data research company isn't hypothetical. It already exists. It’s called @datologyai, and we’ve been building it for 2.5 years. 🧵
Bobby Samuels@BobbySamuels

x.com/i/article/2030…

English
9
25
128
26.9K
Sara Hooker
Sara Hooker@sarahookr·
Woah. Just saw kalshi settles bets based on lmarena. Do they realize how prone to manipulation lmarena results are. Like if I placed a big enough bet I could just pay annotators to skew the market towards the model I wanted to win.
Sara Hooker tweet media
English
14
5
94
9.9K
Anshuman Suri
Anshuman Suri@iamgroot42·
memes apart, another banger by @datologyai, pushing yet another pareto frontier (this time for multilingual curation)! 📈 read more at datologyai.com/blog/berweb-in… for a whole lot of knowledge nuggets lead by @RicardoMonti9 @KaleighMentzer @agcrnz 💪
Ricardo Monti@RicardoMonti9

1/ People often think better multilingual models must come at the cost of English performance. Not true. The constraint isn’t capacity, it’s data quality, and we can fix it. Today @datologyAI shares ÜberWeb: a year of multilingual curation lessons, scaled to 20T+ tokens.

English
1
0
12
575
Ricardo Monti
Ricardo Monti@RicardoMonti9·
1/ People often think better multilingual models must come at the cost of English performance. Not true. The constraint isn’t capacity, it’s data quality, and we can fix it. Today @datologyAI shares ÜberWeb: a year of multilingual curation lessons, scaled to 20T+ tokens.
Ricardo Monti tweet media
English
7
31
150
37.5K
DatologyAI
DatologyAI@datologyai·
New research! ÜberWeb: multilingual data curation across 13 languages and 20 trillion tokens. The "curse of multilinguality" is largely a data quality problem, and it's fixable. tl;dr: we get 4-10x training efficiency improvements over models like Qwen3 and Tiny Aya
DatologyAI tweet media
English
4
12
80
11.2K
Anshuman Suri retweetledi
Kaleigh Mentzer
Kaleigh Mentzer@KaleighMentzer·
🌎Making your model multilingual doesn't have to sacrifice English performance—you just need better data. @agcrnz, @RicardoMonti9, and I have been working on curating the best possible multilingual data with the team @datologyai, and it works! Check out the results 👇
Kaleigh Mentzer tweet media
Ricardo Monti@RicardoMonti9

1/ People often think better multilingual models must come at the cost of English performance. Not true. The constraint isn’t capacity, it’s data quality, and we can fix it. Today @datologyAI shares ÜberWeb: a year of multilingual curation lessons, scaled to 20T+ tokens.

English
0
13
31
2.9K
JosH100
JosH100@josh_wills·
me @ work
JosH100 tweet media
English
6
5
29
1.7K
Sara Hooker
Sara Hooker@sarahookr·
Ok. It is time. I have time on Saturday and Sunday night to explore New Delhi ahead of the summit. Let’s go Delhi food recommendations. 🔥
English
142
4
414
57K
Anshuman Suri retweetledi
Ricardo Monti
Ricardo Monti@RicardoMonti9·
this week I have observed first hand the elite meme game possessed by @iamgroot42 .. truly a generational talent
English
0
1
5
626