Edgar Sucar

120 posts

Edgar Sucar

Edgar Sucar

@SucarEdgar

Postdoc @Oxford_VGG | PhD Dyson Robotics Lab at Imperial College

Oxford, UK Katılım Nisan 2017
1.6K Takip Edilen887 Takipçiler
Sabitlenmiş Tweet
Edgar Sucar
Edgar Sucar@SucarEdgar·
Introducing V-DPM, for 4D reconstruction of in-the-wild videos. We build on top of VGGT, using Dynamic Point Maps for jointly representing 3D and motion. Joint work with: @EldarIsTyping , @LaiZihang , and Andrea Vedaldi. @Oxford_VGG. Check out the demo and code 👇
English
8
34
311
25.2K
Edgar Sucar
Edgar Sucar@SucarEdgar·
Fantastic results Stan! Great to see evidence of the usefulness of 3D representation/data for novel-view synthesis, when used in the right place.
Stan Szymanowicz@StanSzymanowicz

🍺 LagerNVS (CVPR 2026) 🍺 LagerNVS is a generalizable, feed-forward, real-time Novel View Synthesis network which - performs rendering in real time, - generalizes to in-the-wild data, - works with and without known source cameras, - sets a new state-of-the-art among deterministic methods, - can be paired with a diffusion decoder for generative extrapolation. LagerNVS shows that 3D biases are useful for Novel View Synthesis but explicit 3D representations are not required to achieve them. We use 3D biases in (1) architecture design and (2) pre-training: (1) In NVS with explicit 3D representations (3DGS, NeRF) reconstruction is typically difficult and slow, but rendering is much faster and simpler. We mimic this process in the network design: we use a large (1B params) encoder and a small, lightweight decoder (ViT-B). This allows increasing the network capacity while still achieving real-time rendering. (2) The encoder, initialized from VGGT, was pre-trained with 3D reconstruction objectives, making the initial features 3D aware. Both substantially improve performance. Project page: szymanowiczs.github.io/lagernvs Code: github.com/facebookresear… Paper: arxiv.org/abs/2603.20176 Models: huggingface.co/collections/fa… Work done with @jianyuan_wang @MinghaoChen23 Christian Rupprecht and Andrea Vedaldi

English
1
0
9
1K
Jacob Rintamaki
Jacob Rintamaki@jacobrintamaki·
POST BELOW: I'm making a "real-world robotics gc" for anyone interested in buying, deploying, and building robots. If you’re in construction, retail, logistics, manufacturing, eldercare, energy, or data centers, please come on in! Comment or DM.
English
77
13
112
11.8K
Edgar Sucar
Edgar Sucar@SucarEdgar·
Introducing V-DPM, for 4D reconstruction of in-the-wild videos. We build on top of VGGT, using Dynamic Point Maps for jointly representing 3D and motion. Joint work with: @EldarIsTyping , @LaiZihang , and Andrea Vedaldi. @Oxford_VGG. Check out the demo and code 👇
English
8
34
311
25.2K
Edgar Sucar retweetledi
Kirill Mazur
Kirill Mazur@makezur·
Introducing 4D Primitive-Mâché (4DPM), a new method for replayable 4D reconstruction from monocular videos. We split dynamic scenes into 3D primitives and recover their motion. 4DPM can infer object positions even after they leave view. Joint work with @marwan_ptr @AjdDavison
English
5
25
175
32.2K
Edgar Sucar
Edgar Sucar@SucarEdgar·
@vincesitzmann @ducha_aiki @CSProfKGD Will images also go away? They are also “hand-crafted”: regular grid, constant resolution, global exposure, visible spectre. And have noise, same as a depth camera. Maybe their current advantage is rather scale, there is much more of them than “3D data”.
English
0
0
1
53
Vincent Sitzmann
Vincent Sitzmann@vincesitzmann·
@ducha_aiki @CSProfKGD Training time as well! In fact, I would make a more drastic statement that for "embodied intelligence", i.e., building intelligent robots, all expert-crafted 3D structure will go away soon, including the very concept of point clouds and camera poses, whether predicted by NN or no
English
5
2
23
3.6K
Edgar Sucar retweetledi
Nando de Freitas
Nando de Freitas@NandoDF·
The only bitter lesson is that LLMs have succeeded beyond any expert expectations. Underpinning LLMs is the idea of scaling, which is too often misunderstood as more parameters. Scaling is about using massive compute effectively to maximise the throughput of data ingestion into the learning process to obtain more capable models. We are still far from hitting the limits in this. We are still compute hungry because there is a ton more we could achieve if only we had more compute, from experimental ablations to data acquisition and curation. Scaling is largely about data and evals. The models are now trained on almost all the web and equally large (but growing) self generated synthetic data. sifting through such vasts quantities of data (the whole of the human creation) requires formidable engineering and intelligent ideas. This is what differentiates most models. AI is finally in the hands of billions of users, and with it come billions of tasks - every reasonable user need. This scaling in tasks and evaluations is many orders of magnitude larger than pre-LLMs. Having the right architecture matters, but we know several alternatives could all work well, eg replacing attention in Transformers for RNNs and interleaving such layers with local layers. What matters is fine ablations to maximise hardware usage. This is the realm of sophisticated high-precision engineering. It encompasses semiconductor design, datacenter design, distributed systems, MFU, etc. There is fascinating work on flow matching, JEPA, sparser MoEs, etc, that is all consistent with scaling. I’m terrible at predictions, but in this we have stayed the course. There’s been pleasant surprises like the effectiveness of reasoning, which while allowing for less parameters, still demands even more compute. Sparser multimodal MoEs also will allow for better continual learning. This is an old idea, eg arxiv.org/pdf/1108.3298, which is finally being done at scale. Successful scaling is mostly about organising people into effective teams for research, development and production. They have to be teams of happy and ambitious people who put the team first. Yes, tech VCs and CEOs: work life balance matters to achieve prologued success, something I think @demishassabis did really well at @GoogleDeepMind and which I promote at @MicrosoftAI. Bitter lesson: it really is all about scaling and hard work by thousands of amazing people. Hardly bitter, but hopeful and inspiring.
Richard Sutton@RichardSSutton

@GaryMarcus @ylecun @demishassabis You were never alone, Gary, though you were the first to bite the bullet, to fight the good fight, and to make the argument well, again and again, for the limitations of LLMs. I salute you for this good service!

English
39
72
685
195.4K
Edgar Sucar
Edgar Sucar@SucarEdgar·
Good essay on the analogy of the stone soup tale to AI misconception. More emphasis is placed on individual AI models and teams/algorithms who made them, rather than on the collective effort to generate big data, the most important ingredient of the soup. simons.berkeley.edu/news/stone-sou…
English
0
1
7
579
Edgar Sucar
Edgar Sucar@SucarEdgar·
@wkentaro_ depth covariance, super primitive: less optim. for the pixels in a single depth image. DUSt3R: multi-view. still a way to go...
English
1
0
4
210
Edgar Sucar
Edgar Sucar@SucarEdgar·
SLAM bitter lesson: methods that do less “test time optimisation” will eventually trump over the methods that do more .
English
4
0
20
2.2K
Edgar Sucar
Edgar Sucar@SucarEdgar·
@chrisoffner3d In language maybe, in 3D I don't think so. The interface between a faster optim loop at test time and a slower training sloop is still interesting.
English
0
0
2
135
Chris Offner
Chris Offner@chrisoffner3d·
@SucarEdgar I thought we’re in the “scale up test time compute” regime now.
English
1
0
4
318
Edgar Sucar retweetledi
José M. Carranza
José M. Carranza@josemtzcarranza·
Among the keynote speakers, we had two great keynotes delivered by young researchers: Dr. Saiph Savage @saiphcita and Dr. Edgar Sucar @SucarEdgar , who besides being excellent researchers, are proudly Mexican!
José M. Carranza tweet mediaJosé M. Carranza tweet media
English
1
2
5
705
Edgar Sucar retweetledi
Yash Bhalgat
Yash Bhalgat@ysbhalgat·
Rare opportunity to have a conversation with Alyosha and AZ together :) PC: @MikeShou1
Yash Bhalgat tweet mediaYash Bhalgat tweet media
English
1
6
46
5.8K
Javier Civera
Javier Civera@jcivera·
I would expect the output of a visual-language model to be "WTF????!!!!!" 🤣🤣
English
1
0
7
768
Edgar Sucar retweetledi
Anagh Malik
Anagh Malik@anagh_malik·
Delighted to share the first project of my PhD, "Transient Neural Radiance Fields for Lidar View Synthesis and 3D Reconstruction". We show unprecedented capabilities of synthesizing novel lidar scans from as few as 2 input views! 🖥️anaghmalik.com/TransientNeRF
English
6
40
179
44.3K
Edgar Sucar
Edgar Sucar@SucarEdgar·
Impressive Alcaraz the new Wimbledon champion 🎾🎾
English
0
0
5
982