Anton Tsitsulin

1.8K posts

Anton Tsitsulin banner
Anton Tsitsulin

Anton Tsitsulin

@graph_

Pre-training data. Datamodels for Gemini and Gemma 🧑‍🍳 Research Scientist @GoogleAI past (?) life: graph machine learning

Bergabung Ağustos 2016
479 Mengikuti2.8K Pengikut
Miles Brundage
Miles Brundage@Miles_Brundage·
I think I would really enjoy Burning Man For five minutes Who's building this
English
16
2
60
4.9K
Anton Tsitsulin
Anton Tsitsulin@graph_·
@ziv_ravid Wait I’m confused, what do you think is a FPR of a literal string match of 2 substrings like “this paper examines a critical problem” “authors claim to focus on a core challenge of”
English
2
0
16
952
Ravid Shwartz Ziv
Ravid Shwartz Ziv@ziv_ravid·
I (still) wasn't affected by the ICML review policy, which desk rejected all the papers of reviewers who used LLMs to write their reviews (and didn't explicitly mention it) 😱, but this is a bad decision and not a good way to handle AI reviews. First, AI detectors are not reliable enough, with many false positives. Second, if it's a good review, why should I care that AI wrote it? We're using AI assistants everywhere in our day-to-day lives. What is the next step? To ban AI coding agents? I understand the motivation to prevent low-quality reviews, but this is not the way to improve them
English
29
4
200
38.7K
Zhengzhong Tu
Zhengzhong Tu@_vztu·
So @icmlconf just desk-rejected all the papers whose authors have been detected to use LLMs for review. Insane
English
12
9
238
67.5K
Anton Tsitsulin
Anton Tsitsulin@graph_·
ICML is brutal with desk rejections and AC nudges this year, kinda loving it
English
4
1
62
10.1K
Anton Tsitsulin
Anton Tsitsulin@graph_·
@treeinnauvis @navvye It’s literally an objective function to be minimized A specific algorithm to do that (Lloyd’s) is, well, an algorithm The distinction is important because algorithmic improvements are also extremely important, and usually they are the ones that deliver OOM breakthroughs
English
0
0
3
128
Jay
Jay@treeinnauvis·
@graph_ @navvye I still didn’t get it. Why is k-means a “problem”?
English
1
0
0
111
Anton Tsitsulin
Anton Tsitsulin@graph_·
@inductionheads We both sit 5m away from the authors, slightly easier to discover the literature this way..
English
0
0
3
134
Super Dario
Super Dario@inductionheads·
Are you not surprised the GOAT Ali B is the only one keeping up with the literature?
Ali Behrouz@behrouz_ali

This paper is the same as the DeepCrossAttention (DCA) method from more than a year ago: arxiv.org/abs/2502.06785. As far as I understood, here there is no innovation to be excited about, and yet surprisingly there is no citation and discussion about DCA! The level of redundancy in LLM research and then the hype on X is getting worse and worse! DeepCrossAttention is built based on the intuition that depth-wise cross-attention allows for richer interactions between layers at different depths. DCA further provides both empirical and theoretical results to support this approach.

English
3
0
27
4.1K
hope hopes hoping
hope hopes hoping@hopes_revenge·
when your friend who works at a frontier lab tells you how much time is left
hope hopes hoping tweet media
English
17
26
931
24.8K
keysmashbandit
keysmashbandit@keysmashbandit·
This is pmo @united that's not how a flow chart works
keysmashbandit tweet media
English
12
7
1.5K
86.8K
Anton Tsitsulin
Anton Tsitsulin@graph_·
~all reviewers in my ICML batch submitted their reviews by the deadline, AGI is really here
English
1
0
9
1.7K
𝔐𝔽𝓩
𝔐𝔽𝓩@mean_field_zane·
When everyone else goes Irish, you should go Russian.
𝔐𝔽𝓩 tweet media𝔐𝔽𝓩 tweet media𝔐𝔽𝓩 tweet media
English
3
0
38
2.1K
Anton Tsitsulin
Anton Tsitsulin@graph_·
Train Dreams but it’s about scaleai contractor
English
0
0
1
227