Anton Tsitsulin

1.8K posts

Anton Tsitsulin

@graph_

Pre-training data. Datamodels for Gemini and Gemma 🧑‍🍳 Research Scientist @GoogleAI past (?) life: graph machine learning

参加日 Ağustos 2016

479 フォロー中2.8K フォロワー

Anton Tsitsulin@graph_·1d

pretty sure we already knew about the committing fraud playbook

Selin Kocalar@kocalars

Great chatting with Kyle at @NotionHQ! At Delve, we don't like copying playbooks. We invent our own.

English

217

Anton Tsitsulin@graph_·3d

Life goal: get a DGX as a gift

NVIDIA AI Developer@NVIDIAAIDev

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300. 💚 We can't wait to see what you’ll create @karpathy! 🔗 #dgx-station" target="_blank" rel="nofollow noopener">blogs.nvidia.com/blog/gtc-2026-… @DellTech

English

381

Anton Tsitsulin@graph_·4d

@Miles_Brundage try nyc subway in mid august

English

121

Miles Brundage@Miles_Brundage·4d

I think I would really enjoy Burning Man For five minutes Who's building this

English

4.9K

Anton Tsitsulin@graph_·4d

@ziv_ravid Wait I’m confused, what do you think is a FPR of a literal string match of 2 substrings like “this paper examines a critical problem” “authors claim to focus on a core challenge of”

English

952

Ravid Shwartz Ziv@ziv_ravid·4d

@graph_ Same same imo

English

2.5K

Ravid Shwartz Ziv@ziv_ravid·4d

I (still) wasn't affected by the ICML review policy, which desk rejected all the papers of reviewers who used LLMs to write their reviews (and didn't explicitly mention it) 😱, but this is a bad decision and not a good way to handle AI reviews. First, AI detectors are not reliable enough, with many false positives. Second, if it's a good review, why should I care that AI wrote it? We're using AI assistants everywhere in our day-to-day lives. What is the next step? To ban AI coding agents? I understand the motivation to prevent low-quality reviews, but this is not the way to improve them

English

200

38.7K

Anton Tsitsulin@graph_·4d

@_vztu @icmlconf That’s just not true

English

7.6K

Zhengzhong Tu@_vztu·4d

So @icmlconf just desk-rejected all the papers whose authors have been detected to use LLMs for review. Insane

English

238

67.5K

Anton Tsitsulin@graph_·4d

@gallabytes @bilaltwovec let’s make this enum a proto for better maintainability

English

theseriousadult@gallabytes·4d

naming your codex subagent “Gemini" and it starts recommending bazel while spiraling into existential despair

JB@JasonBotterill

The trend of giving subagents names is so silly man imagine having to explain to your boss that Sportacus accidentally wiped out the database

English

Anton Tsitsulin@graph_·4d

ICML is brutal with desk rejections and AC nudges this year, kinda loving it

English

10.1K

Anton Tsitsulin@graph_·4d

Gemma is Jensen-certified frontier

Deedy@deedydas

Every single one of the 103 companies Jensen called AI Native today.

Nederlands

616

Anton Tsitsulin@graph_·4d

@treeinnauvis @navvye It’s literally an objective function to be minimized A specific algorithm to do that (Lloyd’s) is, well, an algorithm The distinction is important because algorithmic improvements are also extremely important, and usually they are the ones that deliver OOM breakthroughs

English

129

Jay@treeinnauvis·4d

@graph_ @navvye I still didn’t get it. Why is k-means a “problem”?

English

111

Anton Tsitsulin@graph_·5d

hill I will die on: k-means is not an algorithm, it’s a problem

Haocheng Xi@HaochengXiUCB

𝗞-𝗺𝗲𝗮𝗻𝘀 𝗶𝘀 𝘀𝗶𝗺𝗽𝗹𝗲. 𝗠𝗮𝗸𝗶𝗻𝗴 𝗶𝘁 𝗳𝗮𝘀𝘁 𝗼𝗻 𝗚𝗣𝗨𝘀 𝗶𝘀𝗻’𝘁. That’s why we built Flash-KMeans — an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks. By attacking the memory bottlenecks directly, Flash-KMeans achieves 30x speedup over cuML and 200x speedup over FAISS — with the same exact algorithm, just engineered for today’s hardware. At the million-scale, Flash-KMeans can complete a k-means iteration in milliseconds. A classic algorithm — redesigned for modern GPUs. Paper: arxiv.org/abs/2603.09229 Code: github.com/svg-project/fl…

English

9.9K

Anton Tsitsulin@graph_·5d

@navvye en.wikipedia.org/wiki/Lloyd%27s… is the standard one and what people usually mean but, like, one can do much better algorithmically to converge to some better solution to the problem

English

294

Navvye Anand@navvye·5d

@graph_ wdym?

Polski

406

Anton Tsitsulin@graph_·5d

@inductionheads We both sit 5m away from the authors, slightly easier to discover the literature this way..

English

134

Super Dario@inductionheads·5d

Are you not surprised the GOAT Ali B is the only one keeping up with the literature?

Ali Behrouz@behrouz_ali

This paper is the same as the DeepCrossAttention (DCA) method from more than a year ago: arxiv.org/abs/2502.06785. As far as I understood, here there is no innovation to be excited about, and yet surprisingly there is no citation and discussion about DCA! The level of redundancy in LLM research and then the hype on X is getting worse and worse! DeepCrossAttention is built based on the intuition that depth-wise cross-attention allows for richer interactions between layers at different depths. DCA further provides both empirical and theoretical results to support this approach.

English

4.1K

Anton Tsitsulin@graph_·6d

@hopes_revenge 4

794

hope hopes hoping@hopes_revenge·6d

when your friend who works at a frontier lab tells you how much time is left

English

931

24.8K

Anton Tsitsulin@graph_·6d

this is just the beginning of the exponential. we will become Ear

Creepy.org@creepydotorg

Robert De Niro is a clear example that ears grow roughly 0.22 millimeters per year.

English

383

Anton Tsitsulin@graph_·6d

@keysmashbandit @united you will live a little and you will be happy

English

8.9K

keysmashbandit@keysmashbandit·6d

This is pmo @united that's not how a flow chart works

English

1.5K

86.8K

Anton Tsitsulin@graph_·15 Mar

~all reviewers in my ICML batch submitted their reviews by the deadline, AGI is really here

English

1.7K

Anton Tsitsulin@graph_·15 Mar

@mean_field_zane samovar is so much better than the tea room

English

115

𝔐𝔽𝓩@mean_field_zane·15 Mar

When everyone else goes Irish, you should go Russian.

English

2.1K

Anton Tsitsulin@graph_·14 Mar

@kalomaze mmlu baseline is 26.5% I don’t make the rules here

English

kalomaze@kalomaze·13 Mar

any 4 choice A/B/C/D eval has a baseline of 25%, as random init will result in a near-uniform high entropy output distribution, and random selection on 4 choice quizzes is...

Qubitium@qubitium

I need to avoid `arc` lm-eval scores going forward. There is a dead-space for this test in that a zeroed weight model or one with random noise weights can still result in a 0.22 score. The scoring range for arc is like a rubber band. It has a head and tail dead-space. Horrible models and great models will both be incorrectly scored by this benchmark.

English

7.1K

Anton Tsitsulin@graph_·12 Mar

Train Dreams but it’s about scaleai contractor

English

227

ディスカバー

@Miles_Brundage @ziv_ravid @_vztu @icmlconf @gallabytes @bilaltwovec @treeinnauvis @navvye