Roger Waleffe

28 posts

Roger Waleffe

@RWaleffe

Computer Sciences PhD student at the University of Wisconsin-Madison

Katılım Haziran 2020

25 Takip Edilen81 Takipçiler

Roger Waleffe retweetledi

Bryan Catanzaro@ctnzr·15 Ara

Today, @NVIDIA is launching the open Nemotron 3 model family, starting with Nano (30B-3A), which pushes the frontier of accuracy and inference efficiency with a novel hybrid SSM Mixture of Experts architecture. Super and Ultra are coming in the next few months.

English

222

1.2K

505.1K

Roger Waleffe retweetledi

Bryan Catanzaro@ctnzr·18 Ağu

Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the models, datasets, and tech report are here: research.nvidia.com/labs/adlr/NVID…

English

231

1.4K

276K

Roger Waleffe retweetledi

Bryan Catanzaro@ctnzr·22 Mar

Nemotron-H: A family of Hybrid Mamba-Transformer LLMs. * Hybrid architecture means up to 3X faster at the same accuracy * Trained in FP8 * Great for VLMs * Weights and instruct versions to come soon. research.nvidia.com/labs/adlr/nemo…

English

101

630

201.1K

Bryan Catanzaro@ctnzr·13 Haz

A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less arxiv.org/pdf/2406.07887

English

435

118.8K

Roger Waleffe@RWaleffe·13 Haz

@WesleyYue @ctnzr The authors of the RULER benchmark observed something similar for some Transformers they tested.

English

Roger Waleffe@RWaleffe·13 Haz

@WesleyYue @ctnzr See our discussion here:

English

Roger Waleffe retweetledi

Theo Rekatsinas@thodrek·9 May

Data pruning to reduce pertaining costs is hot, but fancy pruning can take just as long to select data as to train on all of it! Patrik, @Rwaleffe, and @vmageirakos's work at #ICLR2024 tomorrow shows how a simple, low-cost tweak to random sampling outperforms trendy methods!

Roger Waleffe@RWaleffe

Not convinced about using random sampling for data pruning? Consider twice! In our recent work, we introduce Repeated Sampling of Random Subsets: arxiv.org/abs/2305.18424, where we sample a subset of data at each epoch of training instead of only once at the beginning!

English

2.4K

Roger Waleffe@RWaleffe·31 Tem

@DisseminatePod Thanks for having me on the podcast Jack!

English

Roger Waleffe retweetledi

Disseminate: The Computer Science Research Podcast@DisseminatePod·31 Tem

🚨 "MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks" with @RWaleffe is available now! 🎧 Listen on Spotify ➡️ open.spotify.com/show/6IQIF9oRS… ☕️ Support the podcast ➡️ buymeacoffee.com/disseminate

English

886

Roger Waleffe@RWaleffe·1 Haz

@rishiyer Thanks for sharing this!! Continuous random exploration had also been part of our motivation.

English

134

Rishabh Iyer@rishiyer·1 Haz

@RWaleffe Nice work! We had also observed this some time back. The reason this kind of random sampling scales better (we call it random online) is that it explores new subsets very quickly. In recent work (MILO: arxiv.org/pdf/2301.13287…) we combined this idea with representation sampling..

English

308

Roger Waleffe@RWaleffe·1 Haz

English

19K

Roger Waleffe@RWaleffe·1 Haz

@BlackHC Regardless of which ‘viewpoint’ one chooses to look at our method with, this algorithm had yet to be studied extensively (empirically and theoretically).

English

Roger Waleffe@RWaleffe·1 Haz

@BlackHC If the sampling of S’ across rounds is done without replacement (instead of with replacement), then our method can also be seen as training on the full dataset but with early stopping after a few epochs (discussed in the paper). This version is particularly useful for analysis.

English

107

Roger Waleffe@RWaleffe·1 Haz

Joint work with Patrik Okanovic @vmageirakos Kostis Nikolakakis @aminkarbasi @DKalogerias @nmervegurel @thodrek

Indonesia

665

Roger Waleffe@RWaleffe·1 Haz

See the preprint here: arxiv.org/pdf/2305.18424… for extensive evaluations together with the convergence analysis and discussion on its generalization.

English

693

Roger Waleffe retweetledi

PyKEEN@keenuniverse·7 Eyl

Marius, another amazing KGE (and more) library is now auto-formatting its code with black as of github.com/marius-team/ma… 🚀 @JasonMohoney @RWaleffe nice job :)

English

Keşfet

@nvidia @WesleyYue @ctnzr @vmageirakos @DisseminatePod @rishiyer @BlackHC @aminkarbasi