Ramón Calvo

20 posts

Ramón Calvo

@noctrog

PhD student under @francoisfleuret. Prev. Robotics ESOP @eth, intern @NVIDIA, @sony

Switzerland انضم Temmuz 2015

981 يتبع140 المتابعون

تغريدة مثبتة

Ramón Calvo@noctrog·10 Eyl

I'm open sourcing a reimplementation of DINO (v1) in Jax/Flax/NNX. It trains a ViT-S on imagenet-1k in ~10h with x8 RTX 4090 to 68.8% K-NN top-1 accuracy.

English

385

Ramón Calvo أُعيد تغريده

Accepted papers at TMLR@TmlrPub·10 Oca

Leveraging the True Depth of LLMs Ramón Calvo González, Daniele Paliotta, Matteo Pagliardini, Martin Jaggi, François Fleuret. Action editor: Changyou Chen. openreview.net/forum?id=JccJ6… #parallelized #benchmark #llms

Română

386

Ramón Calvo@noctrog·10 Eyl

If you go through the code and think something can be further optimized, please let me know! (I'm sure there's a better way of doing context parallelism...)

English

Ramón Calvo@noctrog·10 Eyl

First time doing Jax. I was really amazed at how simple it was to implement Data Parallelism. I also appreciate that everything in the ecosystem (grain, orbax, ...) is built around distributed training from the ground up.

English

104

Ramón Calvo@noctrog·10 Eyl

I'm open sourcing a reimplementation of DINO (v1) in Jax/Flax/NNX. It trains a ViT-S on imagenet-1k in ~10h with x8 RTX 4090 to 68.8% K-NN top-1 accuracy.

English

385

Ramón Calvo@noctrog·25 Ağu

@eastskykang @crl_ethz Congratulations Dongho!

English

197

Dongho Kang@eastskykang·25 Ağu

I have successfully defended my dissertation "Animal Motion Imitation For Adaptive and Lifelike Control of Legged Robots" at ETH Zurich. A huge thanks to my supervisors, committee members, amazing collaborators, and peers at CRL @crl_ethz who made this possible!

English

5.4K

Ramón Calvo@noctrog·15 Şub

@gwenzek In our implementation, MHA heads are “concatenated” as in all heads are processed by the same call to the attention kernel on each GPU. Note that since layers are merged in pairs, and TP needs n_gpus = 2*n where n >= 1, each gpu will only process heads from MHA1 or from MHA2.

English

130

Guillaume Wenzek@gwenzek·15 Şub

@noctrog Isn't that a complicated way of concatenating heads of two layers?

English

Ramón Calvo@noctrog·14 Şub

What is the true depth of an LLM? Together with @DanielePaliotta, @MatPagliardini, M. Jaggi and @francoisfleuret we show that LLMs may have a smaller effective depth, and that it can be exploited to increase inference speeds on multi-GPU settings! arxiv.org/abs/2502.02790 (1/N)

English

9.7K

Ramón Calvo@noctrog·14 Şub

I would like to thank @dj_jiben for the thoughtful discussions and help with some plots! :)

English

205

Ramón Calvo@noctrog·14 Şub

You can find the reference LP implementation here: github.com/noctrog/effect… (10/10)

English

264

Ramón Calvo أُعيد تغريده

François Fleuret@francoisfleuret·8 Şub

With the awesome @noctrog, @DanielePaliotta, @MatPagliardini, and Martin Jaggi. @sciences_UNIGE @ICepfl TL;DR: you can shuffle the middle layers of a transformer without retraining it. We take advantage of that to compute layers in parallel. arxiv.org/abs/2502.02790

English

356

24.1K

Ramón Calvo أُعيد تغريده

Eloi Alonso@EloiAlonso1·11 Eki

As a comparison to #GameNGen, our model was trained on only 0.5% of the number of frames, with 1 GPU (compared to 128 TPUs). And our code, model and data are completely open-source! You can play it on your local machine. github.com/eloialonso/dia… (3/n)

English

193

22.6K

Ramón Calvo أُعيد تغريده

François Fleuret@francoisfleuret·11 Eki

Diffusion world models! With @EloiAlonso1 @AdamJelley2 and @micheli_vincent and colleagues. Counter-Strike, trained on 4090, 5M frames. You can install it and *play* in it at 10fps. @UNIGEnews @sciences_UNIGE

Eloi Alonso@EloiAlonso1

Ever wanted to play Counter-Strike in a neural network? These videos show people playing (with keyboard & mouse) in 💎 DIAMOND's diffusion world model, trained to simulate the game Counter-Strike: Global Offensive. 💻 Download and play it yourself → github.com/eloialonso/dia… 🧵

English

149

15.4K

اكتشف

@eastskykang @crl_ethz @gwenzek @DanielePaliotta @MatPagliardini @francoisfleuret @sciences_UNIGE @ICepfl