David Nordström

116 posts

David Nordström

David Nordström

@davnords

PhD. Student @ Chalmers Computer Vision and Deep Learning Code: https://t.co/R54G3HJNlo

Katılım Şubat 2022
104 Takip Edilen24 Takipçiler
Chris Offner
Chris Offner@chrisoffner3d·
"Trained exclusively on synthetic data, [WAFT-Stereo] achieves the best BP-0.5 on the ETH3D benchmark among all existing submissions, corresponding to an 81% error reduction over the strongest established zero-shot baseline."
Chris Offner tweet mediaChris Offner tweet media
English
1
3
26
2.2K
David Nordström
David Nordström@davnords·
@chrisoffner3d @Parskatt @eric_dexheimer @jianyuan_wang My intitial thinking was that it was something like this, i.e.. same as NLL loss in RoMa v2, guess we will see what they are cooking once legal has been sorted out :). I guess it is a little unclear how you do this for arbitrary sequence lengths.
English
0
0
0
46
Chris Offner
Chris Offner@chrisoffner3d·
@Parskatt @eric_dexheimer @davnords @jianyuan_wang So just the same contrastive loss (InfoNCE) as in MASt3R except at the patch level instead of the pixel level? The talk mentions this slide in the context of replacing unnecessary (and computationally expensive) heads with multi-task losses, so I'd expect something different.
Chris Offner tweet media
English
2
0
1
93
Chris Offner
Chris Offner@chrisoffner3d·
Already the next question during Christian's talk on attention-based cross-view patch matching was "If that's what it's doing, can't we just directly supervise the attention maps to strictly enforce correct matches?" This is very characteristic of the 3D vision community. ;)
Chris Offner@chrisoffner3d

@anand_bhattad I'd rephrase it to "We _think_ we know what the algorithm should be doing." because, if we fully knew what it should be doing, we wouldn't need ML. I love this interpretability work but it runs the risk of seducing people into imposing classical methods onto learned models.

English
4
1
19
6.8K
David Nordström
David Nordström@davnords·
@gabriberton I guess this one should not work. Really nice tip though, will certainly come in handy.
English
1
0
0
8
David Nordström
David Nordström@davnords·
@gabriberton Does this work for LightGlue-type loss where you supervise each layer or is this then the case of "entangled" terms?
English
1
0
0
21
Gabriele Berton
Gabriele Berton@gabriberton·
I can't stress enough how useful this trick has been for me in all these years It reduces GPU memory by N equal the number of losses, at literally no cost (same speed, exactly same results down to the last decimal digit) For example ... [1/2]
Gabriele Berton@gabriberton

This simple pytorch trick will cut in half your GPU memory use / double your batch size (for real). Instead of adding losses and then computing backward, it's better to compute the backward on each loss (which frees the computational graph). Results will be exactly identical

English
35
203
2.7K
286.3K
Gabriele Berton
Gabriele Berton@gabriberton·
I have joined @GoogleDeepMind! I'll be training VLMs And I'll still keep posting about latest developments on AI, Computer Vision and LLMs So no more posts on PyTorch tricks. I might post about JAX. Stay tuned...
Gabriele Berton tweet media
English
122
66
3.6K
145.3K
David Nordström
David Nordström@davnords·
@lucasmaes_ E.g. even this claim "Prior JEPA methods avoid collapse through heuristics or tricks" seems overly strong when considering that this is the same claim made in LeJEPA, which constitutes prior work.
English
0
0
0
21
David Nordström
David Nordström@davnords·
@lucasmaes_ Great work and very nice with such a small model. However, were JEPAs not already 'easy to train' after LeJEPA was introduced (in a previous paper)? I think this is not clearly stated in this thread, nor in the paper.
English
1
0
0
58
Lucas Maes
Lucas Maes@lucasmaes_·
JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: le-wm.github.io
English
103
539
3.9K
909.2K
David Nordström retweetledi
Gabriele Berton
Gabriele Berton@gabriberton·
VisMatch is on pypi! VisMatch is a wrapper for image matching models, like LightGlue, RoMa-v2, MASt3R, LoFTR, and 50+ more! It's literally as simple as: pip install vismatch vismatch-match --inputs img0 img1 --matcher choose_any To run image matching on any 2 images [1/4]
Gabriele Berton tweet media
English
11
54
417
50K
David Nordström
David Nordström@davnords·
@ducha_aiki I should probably read the paper.... but I like ImLoc so something like it with feedforward reconstruction methods would be nice
English
0
0
0
13
David Nordström
David Nordström@davnords·
@ducha_aiki Hehe sounds that way... Though if you think about it in a similar way as ImLoc they might mean you never have the full scene reconstructed but just query images and create a map near the query?
English
1
0
0
35
Dmytro Mishkin 🇺🇦
Dmytro Mishkin 🇺🇦@ducha_aiki·
Am I stupid, or the idea is to "make offline processing online"?
Dmytro Mishkin 🇺🇦 tweet media
English
2
1
26
3.3K
Zan Gojcic
Zan Gojcic@ZGojcic·
We're releasing DiffusionHarmonizer, an online diffusion enhancer bridging neural reconstruction and photorealistic simulation by correcting artifacts, and harmonizing inserted objects so they truly belong in the scene: matching shadows, lighting & color research.nvidia.com/labs/sil/proje…
English
7
48
276
45.2K
William Holmberg
William Holmberg@WilliamHolmbe19·
When an influencer with millions of followers drops a video of your app and your app is not production ready... and your cloud-bill reaches 1k+ overnight LOL
William Holmberg tweet media
English
3
0
7
398
David Nordström retweetledi
William Holmberg
William Holmberg@WilliamHolmbe19·
Alright we are live!!! Fly anywhere on earth!
English
13
1
31
1.4K
Chuhan Zhang
Chuhan Zhang@ChuhanZhang5·
D4RT is now accepted at #CVPR2026 with full scores (straight 6s) from all the reviewers! Deeply grateful to the reviewers for their time, thoughtful feedback, and for seeing the value in this work. Hope to see everyone in Denver. 🏔️
Chuhan Zhang@ChuhanZhang5

A SINGLE encoder + decoder for all the 4D tasks! We release 🎯 D4RT (Dynamic 4D Reconstruction and Tracking). 📍 A simple, unified interface for 3D tracking, depth, and pose 🌟 SOTA results on 4D reconstruction & tracking 🚀 Up to 100x faster pose estimation than prior works

English
7
17
237
15.8K
Dmytro Mishkin 🇺🇦
Dmytro Mishkin 🇺🇦@ducha_aiki·
#CVPR2026 reviewing -- this year my usefulness score is zero, meaning that my absence would not change any paper outcome.
English
3
0
19
5.2K
Alexandre Morgand
Alexandre Morgand@Almorgand·
"YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting" TL;DR: a unified 3D Gaussian splatting model that reconstructs high-quality scene geometry and camera poses from unposed/uncalibrated images in a single forward pass.
English
7
33
254
14.9K