Ramón Calvo

20 posts

Ramón Calvo banner
Ramón Calvo

Ramón Calvo

@noctrog

PhD student under @francoisfleuret. Prev. Robotics ESOP @eth, intern @NVIDIA, @sony

Switzerland เข้าร่วม Temmuz 2015
981 กำลังติดตาม140 ผู้ติดตาม
ทวีตที่ปักหมุด
Ramón Calvo
Ramón Calvo@noctrog·
I'm open sourcing a reimplementation of DINO (v1) in Jax/Flax/NNX. It trains a ViT-S on imagenet-1k in ~10h with x8 RTX 4090 to 68.8% K-NN top-1 accuracy.
Ramón Calvo tweet media
English
1
0
6
385
Ramón Calvo
Ramón Calvo@noctrog·
If you go through the code and think something can be further optimized, please let me know! (I'm sure there's a better way of doing context parallelism...)
English
0
0
0
93
Ramón Calvo
Ramón Calvo@noctrog·
First time doing Jax. I was really amazed at how simple it was to implement Data Parallelism. I also appreciate that everything in the ecosystem (grain, orbax, ...) is built around distributed training from the ground up.
English
1
0
0
104
Ramón Calvo
Ramón Calvo@noctrog·
I'm open sourcing a reimplementation of DINO (v1) in Jax/Flax/NNX. It trains a ViT-S on imagenet-1k in ~10h with x8 RTX 4090 to 68.8% K-NN top-1 accuracy.
Ramón Calvo tweet media
English
1
0
6
385
Dongho Kang
Dongho Kang@eastskykang·
I have successfully defended my dissertation "Animal Motion Imitation For Adaptive and Lifelike Control of Legged Robots" at ETH Zurich. A huge thanks to my supervisors, committee members, amazing collaborators, and peers at CRL @crl_ethz who made this possible!
Dongho Kang tweet mediaDongho Kang tweet media
English
5
2
80
5.4K
Ramón Calvo
Ramón Calvo@noctrog·
@gwenzek In our implementation, MHA heads are “concatenated” as in all heads are processed by the same call to the attention kernel on each GPU. Note that since layers are merged in pairs, and TP needs n_gpus = 2*n where n >= 1, each gpu will only process heads from MHA1 or from MHA2.
English
0
0
0
130
Guillaume Wenzek
Guillaume Wenzek@gwenzek·
@noctrog Isn't that a complicated way of concatenating heads of two layers?
English
1
0
0
10
Ramón Calvo
Ramón Calvo@noctrog·
I would like to thank @dj_jiben for the thoughtful discussions and help with some plots! :)
English
0
0
2
205
Ramón Calvo รีทวีตแล้ว
Eloi Alonso
Eloi Alonso@EloiAlonso1·
As a comparison to #GameNGen, our model was trained on only 0.5% of the number of frames, with 1 GPU (compared to 128 TPUs). And our code, model and data are completely open-source! You can play it on your local machine. github.com/eloialonso/dia… (3/n)
English
3
8
193
22.6K
Ramón Calvo รีทวีตแล้ว