Elad Segal

82 posts

Elad Segal

@eladsegal

Deep Learning Research Engineer @NVIDIA

Katılım Ekim 2012

425 Takip Edilen154 Takipçiler

Elad Segal retweetledi

Bryan Catanzaro@ctnzr·11 Mar

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nemotron/… And yes, Ultra is coming!

English

205

1.2K

206.3K

Elad Segal retweetledi

Oleksii Kuchaiev@kuchaev·5 May

Llama-Nemotron-v1 technical report is now available on arxiv arxiv.org/pdf/2505.00949…

English

345

28.8K

Elad Segal retweetledi

Oleksii Kuchaiev@kuchaev·8 Nis

We are excited to release Llama-Nemotron-Ultra! This is a reasoning ON/OFF, dense 253B model. Open weights and post-training data. huggingface.co/nvidia/Llama-3… We started with llama-405B, changed it via NAS pruning then followed by reasoning-focused post-training: SFT + RL in FP8.

English

123

702

166.4K

Elad Segal retweetledi

AK@_akhaliq·25 Mar

Nvidia just dropped FFN Fusion Rethinking Sequential Computation in Large Language Models

English

514

36.6K

Elad Segal retweetledi

Itay Levy@itayoush·19 Mar

Very excited about the release of the Llama Nemotron Super 49B model 🚀 #GTC25 Using distillation-based NAS (Puzzle) we achieved 5X throughput gain! After SFT and RL, this model tops reasoning benchmarks among open 70B models

English

391

Elad Segal retweetledi

Ohav@ohavba·13 Şub

"One bad apple can spoil the bunch 🍎", and that's doubly true for language agents! Our new paper shows how monitoring and intervention can prevent agents from going rogue, boosting performance by up to 20%. We're also releasing a new multi-agent environment 🕵️‍♂️

English

4.3K

Elad Segal retweetledi

Mor Geva@megamor2·15 Oca

How can we interpret LLM features at scale? 🤔 Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs! We propose efficient output-centric methods that better predict how steering a feature will affect model outputs. New preprint led by my student @GurYoav with dream team @Roym4498, Chen Agassy, and Atticus Geiger 🧵1/

GIF

English

114

7.4K

Elad Segal retweetledi

Mor Geva@megamor2·18 Ara

What's in an attention head? 🤯 We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨ A new preprint with @AmitElhelo 🧵 (1/10)

English

295

25.6K

Elad Segal retweetledi

Ben Bogin@ben_bogin·16 Eyl

📢 New Benchmark: SUPER for Setting UP and Executing tasks from Research repositories Reproducibility is crucial in science. We introduce SUPER to evaluate LLMs' capabilities in autonomously running experiments from research repositories. ⬇️ arxiv.org/pdf/2409.07440

English

19.8K

Elad Segal retweetledi

Ori Yoran@OriYoran·22 Tem

Can AI agents solve realistic, time-consuming web tasks such as “Which gyms near me have fitness classes on the weekend, before 7AM?" We introduce AssistantBench, a benchmark with 214 such tasks. Our new GPT-4 based agent gets just 25% accuracy! assistantbench.github.io

GIF

English

175

43.8K

Elad Segal retweetledi

Maor Ivgi@maorivg·9 Tem

1/7 🚨 What do LLMs do when they are uncertain? We found that the stronger the LLM, the more it hallucinates and the less it loops! This pattern extends to sampling methods and instruction tuning. 🧵👇 @megamor2 @JonathanBerant @OriYoran

English

123

16.7K

Elad Segal retweetledi

Guy Dar@guy_dar1·2 Nis

🇲🇽 Excited to share our work was accepted to #NAACL2024 main conference!! 🇲🇽 ICL has been hypothesized to perform GD implicitly in its parameters. But is there good evidence for that? 🧐 Depends what you mean exactly!!

English

10.5K

Elad Segal retweetledi

Ben Bogin@ben_bogin·16 Kas

Can we leverage pre-existing coding abilities of LLMs to improve semantic parsing and compositional generalization? 🚨 Our new paper shows dramatic improvements when LLMs are prompted with Python rather than DSLs, along with helpful domain descriptions! bit.ly/code-semparse

English

10.3K

Elad Segal retweetledi

Elad Simchayoff@Elad_Si·23 Eki

Watch and Share with the world. A special project by @N12News. This video contains footage taken by the young partygoers at the Nova Music Festival prior to the 7.10 terror attack. You’ll only see a handful of the 260 victims and dozens of those abducted or still missing doing what they came there to do; Party, dance, live. Hours later, the festival became a blood-filled scene of unspeakable crimes.

English

459

2.5K

4.9K

758.8K

Elad Segal retweetledi

Visegrád 24@visegrad24·14 Eki

Eyal Waldman is an Israeli billionaire, high-tech magnate (founder of Mellanox) He built R&D centres in the West Bank & Gaza Strip to employ Palestinian developers in order to build better Israeli-Palestinian relations. Hamas murdered his daughter Daniel at the music festival

English

1.3K

8.1K

22.4K

3.5M

Elad Segal retweetledi

Hananya Naftali@HananyaNaftali·14 Eki

MUST WATCH: A British Author explains the truth about "proportionality". Well done!

English

904

6.5K

17.3K

1.4M

Elad Segal retweetledi

החדשות - N12@N12News·13 Eki

At least 1,300 Israelis were murdered in Hamas terror attack bit.ly/3ZS8pJu

English

151

22.2K

Elad Segal retweetledi

יוסף חדאד - Yoseph Haddad@YosephHaddad·14 Eki

Yonit Levy’s extremely accurate thoughts on the world’s hypocrisy regarding Israel! @LeviYonit

English

285

1.3K

3.5K

280.8K

Elad Segal retweetledi

(((ل()(ل() 'yoav))))👾@yoavgo·12 Eki

Hello colleagues and fellows. Over the past few days I was shocked to learn that people in our community don't share what I consider to be basic human values. Please help me restore faith in our community by signing this. forms.gle/2yi1WP9RNSHPHn…

English

303

122K

Elad Segal retweetledi

Ori Yoran@OriYoran·6 Eki

Retrieval-augmented LMs are not robust to irrelevant context. Retrieving entirely irrelevant context can throw off the model, even when the answer is encoded in its parameters! In our new work, we make RALMs more robust to irrelevant context. arxiv.org/abs/2310.01558 🧵[1/7]

English

138

18.3K

Keşfet

@GurYoav @Roym4498 @AmitElhelo @megamor2 @JonathanBerant @OriYoran @N12News @LeviYonit