Elad Segal

82 posts

Elad Segal

Elad Segal

@eladsegal

Deep Learning Research Engineer @NVIDIA

Katılım Ekim 2012
425 Takip Edilen154 Takipçiler
Elad Segal retweetledi
Bryan Catanzaro
Bryan Catanzaro@ctnzr·
Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nemotron/… And yes, Ultra is coming!
Bryan Catanzaro tweet media
English
62
205
1.2K
206.3K
Elad Segal retweetledi
Oleksii Kuchaiev
Oleksii Kuchaiev@kuchaev·
We are excited to release Llama-Nemotron-Ultra! This is a reasoning ON/OFF, dense 253B model. Open weights and post-training data. huggingface.co/nvidia/Llama-3… We started with llama-405B, changed it via NAS pruning then followed by reasoning-focused post-training: SFT + RL in FP8.
Oleksii Kuchaiev tweet mediaOleksii Kuchaiev tweet media
English
24
123
702
166.4K
Elad Segal retweetledi
AK
AK@_akhaliq·
Nvidia just dropped FFN Fusion Rethinking Sequential Computation in Large Language Models
AK tweet media
English
5
88
514
36.6K
Elad Segal retweetledi
Itay Levy
Itay Levy@itayoush·
Very excited about the release of the Llama Nemotron Super 49B model 🚀 #GTC25 Using distillation-based NAS (Puzzle) we achieved 5X throughput gain! After SFT and RL, this model tops reasoning benchmarks among open 70B models
Itay Levy tweet media
English
1
1
8
391
Elad Segal retweetledi
Ohav
Ohav@ohavba·
"One bad apple can spoil the bunch 🍎", and that's doubly true for language agents! Our new paper shows how monitoring and intervention can prevent agents from going rogue, boosting performance by up to 20%. We're also releasing a new multi-agent environment 🕵️‍♂️
English
2
7
28
4.3K
Elad Segal retweetledi
Mor Geva
Mor Geva@megamor2·
How can we interpret LLM features at scale? 🤔 Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs! We propose efficient output-centric methods that better predict how steering a feature will affect model outputs. New preprint led by my student @GurYoav with dream team @Roym4498, Chen Agassy, and Atticus Geiger 🧵1/
GIF
English
6
25
114
7.4K
Elad Segal retweetledi
Mor Geva
Mor Geva@megamor2·
What's in an attention head? 🤯 We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨ A new preprint with @AmitElhelo 🧵 (1/10)
Mor Geva tweet media
English
5
56
295
25.6K
Elad Segal retweetledi
Ben Bogin
Ben Bogin@ben_bogin·
📢 New Benchmark: SUPER for Setting UP and Executing tasks from Research repositories Reproducibility is crucial in science. We introduce SUPER to evaluate LLMs' capabilities in autonomously running experiments from research repositories. ⬇️ arxiv.org/pdf/2409.07440
Ben Bogin tweet media
English
5
19
72
19.8K
Elad Segal retweetledi
Ori Yoran
Ori Yoran@OriYoran·
Can AI agents solve realistic, time-consuming web tasks such as “Which gyms near me have fitness classes on the weekend, before 7AM?" We introduce AssistantBench, a benchmark with 214 such tasks. Our new GPT-4 based agent gets just 25% accuracy! assistantbench.github.io
GIF
English
7
48
175
43.8K
Elad Segal retweetledi
Maor Ivgi
Maor Ivgi@maorivg·
1/7 🚨 What do LLMs do when they are uncertain? We found that the stronger the LLM, the more it hallucinates and the less it loops! This pattern extends to sampling methods and instruction tuning. 🧵👇 @megamor2 @JonathanBerant @OriYoran
Maor Ivgi tweet media
English
2
31
123
16.7K
Elad Segal retweetledi
Guy Dar
Guy Dar@guy_dar1·
🇲🇽 Excited to share our work was accepted to #NAACL2024 main conference!! 🇲🇽 ICL has been hypothesized to perform GD implicitly in its parameters. But is there good evidence for that? 🧐 Depends what you mean exactly!!
English
1
9
50
10.5K
Elad Segal retweetledi
Ben Bogin
Ben Bogin@ben_bogin·
Can we leverage pre-existing coding abilities of LLMs to improve semantic parsing and compositional generalization? 🚨 Our new paper shows dramatic improvements when LLMs are prompted with Python rather than DSLs, along with helpful domain descriptions! bit.ly/code-semparse
Ben Bogin tweet media
English
2
13
65
10.3K
Elad Segal retweetledi
Elad Simchayoff
Elad Simchayoff@Elad_Si·
Watch and Share with the world. A special project by @N12News. This video contains footage taken by the young partygoers at the Nova Music Festival prior to the 7.10 terror attack. You’ll only see a handful of the 260 victims and dozens of those abducted or still missing doing what they came there to do; Party, dance, live. Hours later, the festival became a blood-filled scene of unspeakable crimes.
English
459
2.5K
4.9K
758.8K
Elad Segal retweetledi
Visegrád 24
Visegrád 24@visegrad24·
Eyal Waldman is an Israeli billionaire, high-tech magnate (founder of Mellanox) He built R&D centres in the West Bank & Gaza Strip to employ Palestinian developers in order to build better Israeli-Palestinian relations. Hamas murdered his daughter Daniel at the music festival
Visegrád 24 tweet media
English
1.3K
8.1K
22.4K
3.5M
Elad Segal retweetledi
Hananya Naftali
Hananya Naftali@HananyaNaftali·
MUST WATCH: A British Author explains the truth about "proportionality". Well done!
English
904
6.5K
17.3K
1.4M
Elad Segal retweetledi
(((ل()(ل() 'yoav))))👾
Hello colleagues and fellows. Over the past few days I was shocked to learn that people in our community don't share what I consider to be basic human values. Please help me restore faith in our community by signing this. forms.gle/2yi1WP9RNSHPHn…
(((ل()(ل() 'yoav))))👾 tweet media
English
34
47
303
122K
Elad Segal retweetledi
Ori Yoran
Ori Yoran@OriYoran·
Retrieval-augmented LMs are not robust to irrelevant context. Retrieving entirely irrelevant context can throw off the model, even when the answer is encoded in its parameters! In our new work, we make RALMs more robust to irrelevant context. arxiv.org/abs/2310.01558 🧵[1/7]
Ori Yoran tweet media
English
1
24
138
18.3K