Diego Porres

6.7K posts

Diego Porres banner
Diego Porres

Diego Porres

@PDillis

Guatemalan 🇬🇹 physicist, Postdoc @CVC_UAB, researching autonomous driving.

Barcelona Beigetreten Ekim 2011
361 Folgt989 Follower
Angehefteter Tweet
Diego Porres
Diego Porres@PDillis·
Fun weekend project: explore the latent space with your hand! Using SD-XL turbo and some pre-defined prompts, this is a proof of concept, but we plan to do so much more. Stay tuned :)
English
0
1
8
546
Diego Porres retweetet
David Serrano-Lozano
David Serrano-Lozano@serra9lozano·
Super-Resolution has been a widely studied field, yet its metrics haven’t kept pace with the methods. Check out RQI, a new perceptual metric that better aligns with human judgment. RQI will be presented at #CVPR2026 Code coming soon, enabling better SR models!
Javi Vazquez-Corral@j_vazquezcorral

🎉 🖥️ Introducing our paper "Bridging the Perception Gap in Image Super-Resolution Evaluation" has been accepted to #CVPR2026! Work lead by Shaolin Su, together with Josep Maria Rocafort, @dxue321, @serra9lozano, and Lei Sun @CVC_UAB @UABBarcelona @INSAITinstitute

English
0
1
6
267
Diego Porres retweetet
Yuki
Yuki@y_m_asano·
Start of Day 2 of the @ELLISforEurope PhD school! First, Robert Geirhos from @GoogleDeepMind on his personal top 10 lessons for future researchers. Relevant advice for folks in academia and industry!
Yuki tweet mediaYuki tweet mediaYuki tweet mediaYuki tweet media
English
1
5
46
2.2K
Diego Porres
Diego Porres@PDillis·
@atulit_gaur You’re training end-to-end driving models and val loss is useless
English
0
0
0
89
atulit
atulit@atulit_gaur·
a question to ask in ml interviews for three consecutive epochs you don't see any meaningful decrease in val loss but then on the fourth epoch you do, why?
English
23
2
251
53.1K
Diego Porres
Diego Porres@PDillis·
I thought the term world models was misused before (> 2 years ago), but nowadays I'd even venture to say that term has lost all meaning
English
0
2
3
85
Diego Porres
Diego Porres@PDillis·
@AlsikkanTV This gave me whiplash to Twitter from the 2010s, thank you
English
0
0
1
615
Chris Oldman
Chris Oldman@AlsikkanTV·
just met someone named Cheddar Larson and she said her sister is a famous actress but she wouldn’t say who
English
331
2.3K
92.6K
2.6M
davinci
davinci@leothecurious·
it's refreshing when two different hypotheses i've been excited about get validated in a single paper. tl;dr: convolutional inductive biases in early stages of visual processing, and latent prediction of global semantic features from local spatial context, can both aid in achieving higher sample efficiency on visual tasks.
davinci tweet media
English
2
8
98
6.5K
Phillip Isola
Phillip Isola@phillip_isola·
As models advance and surpass certain human abilities, “human-level” advances too, as we can use them as tools. So yes a model might do better math/coding/etc than I could have done in 2025. But they still are behind where I could be in 2026! This thought gives me some hope :)
English
9
7
152
11.3K
Diego Porres
Diego Porres@PDillis·
@quantbagel Nice! These are offline metrics though, have you been able to deploy these into a real/sim robot? Note if your inference speed is much higher, it might also perform better (maybe in general, maybe only on tasks that require finer motion).
English
0
0
0
74
Lucas
Lucas@quantbagel·
Robot action models shouldn't need 256 vision tokens per frame. Pi0.5 spends 400M parameters on SigLIP just to see. We replaced it with a 4.4M encoder that outputs 5 tokens — and action quality barely changes. 91x smaller. 51x fewer tokens. 7.3x faster inference.
Lucas tweet media
English
23
31
355
18.7K
Joan Rodriguez
Joan Rodriguez@joanrod_ai·
Introducing @QuiverAI, a new AI lab and product company focused on frontier vector design. We’ve raised an $8.3M seed round led by @a16z, with support from amazing angels and investors. Our first model, Arrow-1.0, generates SVGs from images and text. It’s available now in public beta at app.quiver.ai
English
305
293
4.8K
1.3M
Diego Porres
Diego Porres@PDillis·
@jon_barron You can always mess with some weights of the network for it to fail by the right amount
English
0
0
0
75
Jon Barron
Jon Barron@jon_barron·
Unfortunately these gifs were time-gated to 2025 because they're contingent on 1) models being bad at reproducing the input, which got fixed, 2) platforms subsidizing many independent generations, and 3) human willingness to do a repetitive manual task, which is now agent-work.
English
3
0
10
5.6K
Kamal Gupta
Kamal Gupta@kamalgupta09·
The post triggered a lot of 3D vision folks but it is right on money. Had a similar epiphany regarding robot learning ~2 years ago after a long chat with @ashishkr9311. 3D priors may give you short term efficiency gains, but long term going from video to action, and allowing big models to learn their own intermediate representations is the right direction.
Vincent Sitzmann@vincesitzmann

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…

English
4
1
62
9.5K
Tanishq Mathew Abraham, Ph.D.
Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·
Image Generation with a Sphere Encoder a few-step image generation method by mapping images to a spherical latent space, trained with simple reconstruction+consistency losses
Tanishq Mathew Abraham, Ph.D. tweet media
English
5
23
175
9.3K
Charlie Snell
Charlie Snell@sea_snell·
People who do model merging are like the flat earthers of deep learning
English
18
12
300
40.9K
Gabriele Berton
Gabriele Berton@gabriberton·
This would force the features extracted by the model to be domain agnostic, i.e. the feature contain no info about the domain, making the features more robust on the target domain Cool stuff
English
2
0
2
518
Gabriele Berton
Gabriele Berton@gabriberton·
A little more info on Domain Adaptation: the task is that you would have a labelled train set of one "source" domain (e.g. daytime images) and an unlabelled set from the test/target domain (e.g. night images). [1/N]
Gabriele Berton tweet media
Gabriele Berton@gabriberton

Writing this gave me flashbacks of when CLIP came out. Part of my lab was working on Domain Adaptation, i.e. adapting models to unseen domains. CLIP killed that field CLIP has seen everything, suddenly there was this model with no unseen domain. [1/2]

English
4
3
69
10.5K
hengcherkeng
hengcherkeng@hengcherkeng·
@rsasaki0109 It might just as well the agent writes the paper, makes rebuttals, polishes and makes final submission. Then the agent books air tickets and attends the conference and writes reports. The human author did nothing except paying for electricity bills
English
1
1
8
798
rsasaki0109
rsasaki0109@rsasaki0109·
Paper2Rebuttal RebuttalAgent: AI-Powered Academic Paper Rebuttal Assistant github.com/AutoLab-SAI-SJ… RebuttalAgent is an AI-powered multi-agent system that helps researchers craft high-quality rebuttals for academic paper reviews. The system analyzes reviewer comments, searches relevant literature, generates rebuttal strategies, and produces formal rebuttal letters, all through an interactive human-in-the-loop workflow. Key Features 📄 Automatic Paper Parsing: Converts PDF papers to structured text using Docling 🔍 Issue Extraction: Breaks down reviewer comments into actionable issues with priority levels 📚 Literature Search: Automatically searches arXiv for relevant supporting papers 💡 Strategy Generation: Creates data-driven rebuttal strategies (not sophistry!) ✍️ Rebuttal Writing: Generates formal, conference-ready rebuttal letters 🔄 Human Feedback Loop: Iteratively refine strategies based on author input
English
4
39
306
24.2K
Birchlabs
Birchlabs@Birchlabs·
after hours of debugging, got to the bottom of why training was diverging this whole time I was doing gradient ascent
Birchlabs tweet media
English
27
20
984
42.4K
Diego Porres
Diego Porres@PDillis·
@Algomancer StyleGAN models did this with the disentangled latent space W. From what I've tested, they reach almost the same family of distribution, but this is still to be finished
Diego Porres tweet media
English
0
0
2
145
Adam Hibble
Adam Hibble@Algomancer·
Question for my Flow Matching / Diffusion pilled friends. I've been doing this for years but never seen it on my feed. (I havn't actively looked for it, so if you know any reference papers, kinda just seemed obvious). I use it for my diffusion/flow matching prior vaes, but it works fine in rectified flows / mean flow / etc recipes where your focused on reducing the number of function evaluations. Do people ever learn the prior/starting distribution? ie where the noise distribution (prior) is learned rather than fixed to N(0, I). (Quick toy example below from some of my adverserial flow matching experiments so you know what i mean). The intuition being that optimal transport cost depends on the choice of source distribution. A learned prior reduces the total transport distance by better aligning with the data geometry. github.com/Algomancer/Adv…
Adam Hibble tweet media
English
22
20
282
26K