Ali Athar

56 posts

Ali Athar

Ali Athar

@aliathar94

Applied Scientist at Amazon Prev: Research Scientist at ByteDance, PhD from RWTH Aachen, MSc. from TUM

Katılım Şubat 2010
272 Takip Edilen258 Takipçiler
Ali Athar
Ali Athar@aliathar94·
For those of you looking to extend their Video-LLMs with spatial intelligence capability, this dataset is a potential game-changer. ViCaS is the largest, human-annotated video dataset that provides both captions as well as grounded segmentation masks (3/4)
English
1
0
1
62
Ali Athar
Ali Athar@aliathar94·
In our CVPR'25 paper, we introduced the ViCaS dataset which contains 20,000+ videos with both detailed video captions, as well as pixel-precise masks for selected objects with phrase-grounding (1/4)
English
1
0
2
421
Ali Athar
Ali Athar@aliathar94·
@giffmana Never said it was a perfect solution😅 Although given PyTorch's popularity, the average grad student these days is probably quite familiar with PyTorch API mechanics.
English
0
0
0
43
Ali Athar
Ali Athar@aliathar94·
@CVPR @_vztu Any ETA (even approximate) on when the results will be out?
English
1
0
5
14.5K
Zhengzhong Tu
Zhengzhong Tu@_vztu·
😟Hey friends, star/reply to this tweet if you're also waiting for @CVPR decisions
GIF
English
13
3
181
56.8K
Ali Athar
Ali Athar@aliathar94·
@AljosaOsep @Pandoro_o I think the the confusion arose because the deadline was written as 15 Nov, 2AM CT (which is where the conf venue is), and people just assumed it was until the end of the day according to Pacific time without thinking about the timezone much.
English
0
0
2
117
Aljosa
Aljosa@AljosaOsep·
@Pandoro_o Damn that happened? 😱 I should have teeeted this yday!
English
1
0
1
212
Aljosa
Aljosa@AljosaOsep·
For everyone stressed with #CVPR2025 deadline: imagine learning yesterday that the deadline is today, and not Friday (true story).
English
4
0
28
3.9K
Ali Athar retweetledi
Jonathon Luiten
Jonathon Luiten@JonathonLuiten·
📣📣 Hiring a PhD-Intern 📣📣 Work with me on Dynamic 3D Gaussians at the Meta Boston office for 6 months in summer 2025! Apply here: metacareers.com/jobs/105497412… + write me your questions / link your most relevant work via email or twitter.
Jonathon Luiten@JonathonLuiten

Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis dynamic3dgaussians.github.io We model the world as a set of 3D Gaussians that move & rotate over time. This extends Gaussian Splatting to dynamic scenes, with accurate novel-view synthesis and dense 3D trajectories.

English
7
44
348
46.9K
Ali Athar retweetledi
Karim Knaebel
Karim Knaebel@karimknaebel·
Check out our work on fine-tuning of image-conditional diffusion models for depth and normal estimation. Widely used diffusion models can be improved with single-step inference and task-specific fine-tuning, allowing us to gain better accuracy while being 200x faster!⚡ 🧵(1/6)
Karim Knaebel tweet media
English
5
50
271
41.2K
István Sárándi
István Sárándi@Istvan_Sarandi·
It's been a pleasure to work on this with @GerardPonsMoll1 I think we found a really effective formulation for training large-scale strong pose and shape models. Here are some more qualitative results on tough, in-the-wild YouTube dance videos.
Gerard Pons-Moll@GerardPonsMoll1

For 3D pose some use different keypoints, others SMPL and other models. It's a mess! With Neural Localizer Fields, we can choose the output at test time! allowing to train using any. Results are real time and SOTA across the board. arxiv.org/pdf/2407.07532 @Istvan_Sarandi

English
15
46
335
40.4K
Ali Athar
Ali Athar@aliathar94·
@gabriberton This is already the case for some code-bases I've recently worked with. The image means and stds are saved as buffers with the model checkpoint and applied inside the forward pass.
English
0
0
1
101
Gabriele Berton
Gabriele Berton@gabriberton·
I wish CV models took as input non-normalized images and the norm was part of the forward(). It would avoid silent normalization bugs that are hard to detect because we never expect norm to cause bugs (and >90% of the time we use imagenet mean/std).
English
13
3
82
13.8K
Ali Athar
Ali Athar@aliathar94·
@gabriberton That's a big qualifier IMO. The computation graph will be largely shared if the inputs are applied to an encoder/backbone network, which is often the case. In case it is partially shared, is PyTorch smart enough to release only that part which is not needed by other losses?
English
1
0
3
326
Gabriele Berton
Gabriele Berton@gabriberton·
Limitations: this only works when you have more than one loss (with disentagled computational graphs). Bonus: the more losses you have, the more memory you'll save
English
8
3
167
37.1K
Gabriele Berton
Gabriele Berton@gabriberton·
This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical
Gabriele Berton tweet media
English
49
344
3.2K
841.9K
Jehanzeb Mirza
Jehanzeb Mirza@jmie_mirza·
09.04.2024 -- I managed to defend my PhD. thesis titled 'Unsupervised Adaptation to Distribution Shifts'. Extremely thankful to Prof. Horst Bischof and Prof. @SergeBelongie for making the trip to Graz and agreeing to serve on the committee. @BelongieLab
Jehanzeb Mirza tweet media
English
8
0
24
744
Ali Athar
Ali Athar@aliathar94·
As one journey ends, another beings! For the next phase in life, I've moved to the Bay Area and taken up a Research Scientist position at @BytedanceTalk where I'll continue to work on exciting research problems related to video understanding.
English
0
0
1
192
Ali Athar
Ali Athar@aliathar94·
Successfully defended my PhD at the @RWTHVisionLab! I'm thankful to my supervisor, colleagues, family members and to God for this incredible 5-year experience! Aside from the professional/research experience, I'll cherish the personal bonds I made here for a long time to come.
Ali Athar tweet media
English
3
2
24
1.4K