Prabhu Teja

77 posts

Prabhu Teja

Prabhu Teja

@TejaPrab

Applied Scientist at AWS. PhD from EPFL/Idiap.

Katılım Ağustos 2009
1.5K Takip Edilen163 Takipçiler
Sabitlenmiş Tweet
Prabhu Teja
Prabhu Teja@TejaPrab·
Paper on #LocalSGD at #TMLR with co-authors @lukas_balles and @cedapprox Previous theoretical work showed that Local SGD (L-SGD) converges like #SGD, whereas practical works show that Local SGD lags in performance. We study why this is the case. 🧵1/4
Accepted papers at TMLR@TmlrPub

On the Choice of Learning Rate for Local SGD Lukas Balles, Prabhu Teja S, Cedric Archambeau. Action editor: Robert Gower. openreview.net/forum?id=DPvwr… #sgd #synchronization #bottleneck

English
1
0
8
1K
François Fleuret
François Fleuret@francoisfleuret·
Re-parametrization trick for GMMs anybody?
English
7
1
51
19.4K
Prabhu Teja
Prabhu Teja@TejaPrab·
So, we find that when scaling only the LR of L-SGD is not always a practical choice. There are a lot of details and experiments that I can't fit here, so go through the paper for all those, and get in touch if you want to discuss something! 🧵4/4
English
0
0
0
104
Prabhu Teja
Prabhu Teja@TejaPrab·
We study optimal LR of SGD and L-SGD. Our finding is: optimal LR for L-SGD differs from that of SGD, and when using it the performance of L-SGD matches that of SGD. Also, when using the optimal LR, L-SGD is faster in wall-clock only when cost of comms >> computation. 🧵3/4
English
1
0
0
111
Prabhu Teja
Prabhu Teja@TejaPrab·
Paper on #LocalSGD at #TMLR with co-authors @lukas_balles and @cedapprox Previous theoretical work showed that Local SGD (L-SGD) converges like #SGD, whereas practical works show that Local SGD lags in performance. We study why this is the case. 🧵1/4
Accepted papers at TMLR@TmlrPub

On the Choice of Learning Rate for Local SGD Lukas Balles, Prabhu Teja S, Cedric Archambeau. Action editor: Robert Gower. openreview.net/forum?id=DPvwr… #sgd #synchronization #bottleneck

English
1
0
8
1K
Prabhu Teja retweetledi
Giovanni Zappella
Giovanni Zappella@610v4nn1__·
Do you want to play around with continual learning, transformers, prompting, etc.? Renate v0.5 gives you some of the most recent algorithms (e.g., S-Prompts) github.com/awslabs/Renate
English
0
1
6
282
Prabhu Teja retweetledi
Lukas Balles
Lukas Balles@lukas_balles·
We just released v0.3.0 of Renate, our PyTorch library for continual learning. We now have improved support for NLP applications and provide methods for distribution shift detection, which can help you decide when to update your model. github.com/awslabs/Renate
Lukas Balles tweet media
English
1
3
11
1.5K
Prabhu Teja retweetledi
Giovanni Zappella
Giovanni Zappella@610v4nn1__·
Renate v0.2 is finally out 🌟⭐️🤩 If you are interested in continual learning, you can see a summary of the new features here: github.com/awslabs/Renate…
English
0
4
10
1K
Prabhu Teja retweetledi
Mohammad Mahdi johari
Mohammad Mahdi johari@mm_johari·
So happy that our paper was selected as a highlight at @CVPR 2023 (10% of the accepted papers, 2.5% of all submissions)
Mohammad Mahdi johari@mm_johari

Thrilled to announce that my paper with @francoisfleuret and @CamCarta is accepted at @CVPR. NeRF-based RGB-D SLAM that improves state-of-the-art by x10 in speed, and 50% in reconstruction and camera localization accuracy. @Idiap_ch @unige_en @amsOSRAM idiap.ch/paper/eslam

English
0
4
13
3.1K
Prabhu Teja retweetledi
François Fleuret
François Fleuret@francoisfleuret·
I have an 𝗼𝗽𝗲𝗻 𝗣𝗵𝗗 𝗽𝗼𝘀𝗶𝘁𝗶𝗼𝗻 𝗶𝗻 𝗿𝗲𝗽𝗿𝗲𝘀𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗮𝗻𝗱 𝗶𝗻𝘁𝗲𝗿𝗽𝗿𝗲𝘁𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗳𝗼𝗿 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 in my group at the University of Geneva. fleuret.org/francois/hirin…
English
12
93
417
130.4K
Prabhu Teja retweetledi
François Fleuret
François Fleuret@francoisfleuret·
One paper at BMVC with Evann Courdier and @TejaPrab : "PAUMER: Patch Pausing Transformer for Semantic Segmentation." To reduce computation, when moving forward through attention layers, we stop updating tokens from which a confident prediction can be done.
English
0
4
22
0
François Fleuret
François Fleuret@francoisfleuret·
The side effects being that, (1) since the storage may be shared, modifying e.g. a "slice" may change the original tensor, and (2) some strides may correspond to a non-contiguous organization in memory, e.g. after a transpose.
English
4
1
36
0
François Fleuret
François Fleuret@francoisfleuret·
A recent exchange on Twitter made me realize that many may be using tensors without knowing how they are represented in memory, and how it makes some operations super fast. This is standard stuff common to many tensor libraries, including our beloved @numpy_team and @PyTorch.
English
11
134
908
0
Big ಮೂಡ್
Big ಮೂಡ್@Manasanity·
I too am a woman in STEM coz I know words like random forest and fuzzy logic
English
4
0
2
0
Prabhu Teja
Prabhu Teja@TejaPrab·
We (@francoisfleuret & I) are really excited to present our paper "Test time Adaptation through Perturbation Robustness" at @NeurIPSConf workshop DistShift! Join us for a chat on 13th Dec from 10PM-12AM (CET), 1-3PM PST, 2.30AM-4.30AM IST (14th Dec). openreview.net/forum?id=GbBeI…
François Fleuret@francoisfleuret

Test-time training is getting some traction indeed. @TejaPrab and I have a paper on that very topic at @NeurIPSConf DistShift workshop: #prabhu-fleuret-2021b" target="_blank" rel="nofollow noopener">fleuret.org/francois/publi…

English
3
5
22
0
Prabhu Teja
Prabhu Teja@TejaPrab·
@liu_yuejiang @francoisfleuret @NeurIPSConf Thank you! We found this a non-trivial question to answer. Some SSL methods enforce invariance of features to augs, but fine-tune a classifier on those representations. However, we find something slightly different; inv to augs isn't independent from correct classification (1/2)
English
1
0
1
0