Yoav Gur Arieh

76 posts

Yoav Gur Arieh

Yoav Gur Arieh

@GurYoav

CS PhD Student at Tel Aviv University | AI researcher at doubleAI | Researching LLM interpretability

Katılım Ağustos 2019
180 Takip Edilen173 Takipçiler
Sabitlenmiş Tweet
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
🧠 To reason over text and track entities, we find that language models use three types of 'pointers'! They were thought to rely only on a positional one—but when many entities appear, that system breaks down. Our new paper shows what these pointers are and how they interact 👇
GIF
English
2
15
75
16.5K
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
I'll be at #ICLR2026 presenting our paper on how LLMs perform entity binding! 📍Catch our poster on Friday 15:15-17:45 at Pavilion 4 (4306). I’ll be around all week, happy to chat or grab coffee. Feel free to reach out!
Yoav Gur Arieh@GurYoav

🧠 To reason over text and track entities, we find that language models use three types of 'pointers'! They were thought to rely only on a positional one—but when many entities appear, that system breaks down. Our new paper shows what these pointers are and how they interact 👇

English
0
2
14
571
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
Academia is only dying because these people keep trying to kill it
English
0
0
13
1.3K
Yoav Gur Arieh retweetledi
Antonin Poché
Antonin Poché@Antonin_Poche·
🔥Super excited to share our new demo website for 🪄Interpreto! 🖼️It is basically an explanation gallery showcasing attribution and concept-based explanations for classification and generation. 🎮Play with it: for-sight-ai.github.io/interpreto-dem… We will keep improving it, so stay tuned!
English
2
2
7
219
Yoav Gur Arieh retweetledi
Amnon Shashua
Amnon Shashua@AmnonShashua·
DoubleAI’s AI system just beat a decade of expert GPU engineering WarpSpeed just beat a decade of expert-engineered GPU kernels — every single one of them. cuGraph is one of the most widely used GPU-accelerated libraries in the world. It spans dozens of graph algorithms, each written and continuously refined by some of the world’s top performance engineers. @_doubleAI_'s WarpSpeed autonomously rewrote and re-optimized these kernels across three GPU architectures (A100, L4, A10G). Today, we released the hyper-optimized version on GitHub — install it with no change to your code. The numbers: - 3.6x average speedup over human experts - 100% of kernels benefit from speedup - 55% see more than 2x improvement. But hasn’t AI already achieved expert-level status — winning gold medals at IMO, outperforming top programmers on CodeForces? Not quite. Those wins share three hidden crutches: abundant training data, trivial validation, and short reasoning chains. Where all three hold, today’s AI shines. Remove any one of them and it falls apart (as Shai Shalev Shwartz wrote in his post). GPU performance engineering breaks all three. Data is scarce. Correctness is hard to validate. And performance comes from a long chain of interacting choices — memory layout, warp behavior, caching, scheduling, graph structure. Even state-of-the-art agents like Claude Code, Codex, and Gemini CLI fail dramatically here, often producing incorrect implementations even when handed cuGraph’s own test suite. Scaling alone can’t break this barrier. It took new algorithmic ideas — our Diligent framework for learning from extremely small datasets, our PAC-reasoning methodology for verification when ground truth isn’t available, and novel agentic search structures for navigating deep decision chains. This is the beginning of Artificial Expert Intelligence (AEI) — not AGI, but something the world needs more: systems that reliably surpass human experts in the domains where expertise is rarest, slowest, and most valuable. If AI can surpass the world’s best GPU engineers, which domain falls next? For the full blog: doubleai.com/research/doubl… CuGraph: docs.rapids.ai/api/cugraph/st… Winning Gold at IMO 2025: arxiv.org/abs/2507.15855 Codeforces benchmarks: rdworldonline.com/openai-release… @shai_s_shwartz post: x.com/shai_s_shwartz… From Reasoning to Super-Intelligence: A Search-Theoretic Perspective arxiv.org/abs/2507.15865 Artificial Expert Intelligence through PAC-reasoning arxiv.org/abs/2412.02441
Amnon Shashua tweet media
English
17
34
193
66.5K
Yoav Gur Arieh retweetledi
Shai Shalev-Shwartz
Shai Shalev-Shwartz@shai_s_shwartz·
1/ Software was eating the world - and now AI is eating software. AI already beats humans at math/coding (IMO, CodeForces). Right? So let's test the strongest coding agents on a real domain: optimizing cuGraph (GPU graph analytics kernels). Spoiler: * The strongest coding agents crash. * And @_doubleAI_ built WarpSpeed - an AI that beat a decade of expert-engineered GPU kernels. 🧵
Shai Shalev-Shwartz tweet media
English
10
18
126
57.6K
Mark Tenenholtz
Mark Tenenholtz@marktenenholtz·
If you haven't tried a Model Council since GPT-5.2/Opus 4.6/Gemini 3.1 have been out, stop everything you're doing and try it. Unreal how thorough responses are with the combined intelligence of these 3 models.
English
19
10
259
39.5K
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
I feel like my experience with Claude recently has been similar to Carol's experience in Pluribus
English
1
0
2
222
Yoav Gur Arieh retweetledi
Andrew Lee
Andrew Lee@a_jy_l·
😻New preprint! As an interp researcher, I often ask “why did the model attend to this token?” We study this by decomposing the query-key (QK) space into interpretable low-rank subspaces. When these subspaces of Qs and Ks align, the model produces high attention scores. 1/N
Andrew Lee tweet media
English
4
19
134
7K
Yoav Gur Arieh retweetledi
Mor Geva
Mor Geva@megamor2·
📣📣New preprint AI consciousness has never been more timely or polarized. Some call LLMs stochastic parrots🦜 Others warn about model welfare and existential risks ⚠️ In an interdisciplinary collaboration with @GoldsteinYAriel and @Liad_Mudrik, led by my student @noam_steinmetz, we use interpretability tools to test a key consciousness indicator from neuroscientific theories. Our results show indications of belief-guided agency and meta-cognitive capacity in LLMs 🧵1/
GIF
English
5
15
69
2.8K
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
Working at a cafe and the parents next to me just handed their two year old a phone playing hours of AI generated cat videos What are we doing here
English
0
0
1
76
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
The skills required to conduct research and the ones required to design a good poster presenting that research are orthogonal I didn't sign up to be a designer!!
English
1
0
4
235
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
@yoavgo Going off whether I'd want China able to do whatever it wants to other countries like it does to its neighbors+whether the dominant global power pushes for democ/autoc around the world They also have a lot to offer the world, so who knows! I'm just more fearful than optimistic
English
0
0
0
32
(((ل()(ل() 'yoav))))👾
there is a lot of discourse around fears that China will "take over the US" in terms of "world leadership" and derivatives like AI, Tech, economy, .... etc. But what are people actually fearing? why is it considered "a bad thing"? what are the China traits that are "scary"?
English
16
1
28
6.5K
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
@yoavgo Yeah, but I feel like in the past that bullying power was mainly directed towards more economic integration and global rule following (generally), which benefited the US, but also everyone else (EU, China). My perspective is that China's bullying so far seems more zero sum.
English
1
0
0
18
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
@yihaoli_0302 Thanks! And def agree regarding probing. Didn't have enough time, but that would be a very interesting follow-up. Unfortunately I won't be at NeurIPS, but would love to discuss this more! Text/visual binding sounds very interesting :)
English
0
0
0
40
Yihao Li
Yihao Li@yihaoli_0302·
Really nice work! Binding in language can be more challenging than vision when it comes to handling longer contexts and larger numbers of entities so that they require different mechanims like pointers/ positional indexing. I really like how you systematically evaluated the range of possible mechanisms. Method-wise, I think binding in language could also benefit from probing, alongside counterfactual interventions. For instance, probing for “which group index does the model believe the query belongs to?” or “which entity does the model think is lexically paired?” could complement the causal interventions and offer a more fine-grained view of how these mechanisms are represented internally. I’m also very interested in binding in VLMs, especially how language and visual information about the same entity get bound together. I’d love to chat more!
English
1
0
0
104
Yihao Li
Yihao Li@yihaoli_0302·
🧵[1/8] Excited to share our NeurIPS 2025 Spotlight paper “Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers?” ✨ To add to the broader discussion of binding in neural networks, we ask whether and how Vision Transformers perform object binding (the ability to bind an object’s many features together as a coherent whole).💡 📄 Paper: openreview.net/pdf?id=5BS6gBb…
Yihao Li tweet media
English
14
96
714
62.2K
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
@yoavgo Also of course the US has also been a negative force (Afghanistan, Iraq), and more so recently. But I think it's hard to argue that their effect on the 20th century hasn't been net positive.
English
1
0
0
56
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
@yoavgo That makes China being on top and potentially calling certain shots via econ/tech superiority scary. Also they're bullies to their neighbors and that might expand to more countries if unchecked (eg around South China Sea, Australia etc)
English
2
0
0
68
Yoav Gur Arieh
Yoav Gur Arieh@GurYoav·
Me waiting for my ICLR reviewers to respond to my rebuttal
Yoav Gur Arieh tweet media
English
1
2
23
3.1K