Diego Porres

6.7K posts

Diego Porres

@PDillis

Guatemalan 🇬🇹 physicist, Postdoc @CVC_UAB, researching autonomous driving.

Barcelona Beigetreten Ekim 2011

361 Folgt989 Follower

Angehefteter Tweet

Diego Porres@PDillis·4 Haz

Fun weekend project: explore the latent space with your hand! Using SD-XL turbo and some pre-defined prompts, this is a proof of concept, but we plan to do so much more. Stay tuned :)

English

546

Diego Porres retweetet

David Serrano-Lozano@serra9lozano·14h

Super-Resolution has been a widely studied field, yet its metrics haven’t kept pace with the methods. Check out RQI, a new perceptual metric that better aligns with human judgment. RQI will be presented at #CVPR2026 Code coming soon, enabling better SR models!

Javi Vazquez-Corral@j_vazquezcorral

🎉 🖥️ Introducing our paper "Bridging the Perception Gap in Image Super-Resolution Evaluation" has been accepted to #CVPR2026! Work lead by Shaolin Su, together with Josep Maria Rocafort, @dxue321, @serra9lozano, and Lei Sun @CVC_UAB @UABBarcelona @INSAITinstitute

English

267

Diego Porres retweetet

Yuki@y_m_asano·16h

Start of Day 2 of the @ELLISforEurope PhD school! First, Robert Geirhos from @GoogleDeepMind on his personal top 10 lessons for future researchers. Relevant advice for folks in academia and industry!

English

2.2K

Diego Porres@PDillis·3d

@atulit_gaur You’re training end-to-end driving models and val loss is useless

English

atulit@atulit_gaur·3d

a question to ask in ml interviews for three consecutive epochs you don't see any meaningful decrease in val loss but then on the fourth epoch you do, why?

English

251

53.1K

Diego Porres@PDillis·6d

@adamtotscomix

GIF

QME

Adam Ellis@adamtotscomix·19 Mar

I almost got held back in 5th grade because I couldn’t focus in class. All day long I could only think about getting home and playing FF7.

１６ビット📀@favoga_

当時リアルタイムでプレイしてた人に聞きたい。FFⅦって発売当初からすげぇぇてなってたの？

English

1.5K

46.5K

Diego Porres@PDillis·15 Mar

I thought the term world models was misused before (> 2 years ago), but nowadays I'd even venture to say that term has lost all meaning

English

Diego Porres@PDillis·12 Mar

@AlsikkanTV This gave me whiplash to Twitter from the 2010s, thank you

English

615

Chris Oldman@AlsikkanTV·12 Mar

just met someone named Cheddar Larson and she said her sister is a famous actress but she wouldn’t say who

English

331

2.3K

92.6K

2.6M

Diego Porres@PDillis·9 Mar

@leothecurious Holy shit

English

davinci@leothecurious·9 Mar

it's refreshing when two different hypotheses i've been excited about get validated in a single paper. tl;dr: convolutional inductive biases in early stages of visual processing, and latent prediction of global semantic features from local spatial context, can both aid in achieving higher sample efficiency on visual tasks.

English

6.5K

Diego Porres@PDillis·6 Mar

@phillip_isola This is just Zeno’s paradox for the modern times

English

105

Phillip Isola@phillip_isola·6 Mar

As models advance and surpass certain human abilities, “human-level” advances too, as we can use them as tools. So yes a model might do better math/coding/etc than I could have done in 2025. But they still are behind where I could be in 2026! This thought gives me some hope :)

English

152

11.3K

Diego Porres@PDillis·4 Mar

@quantbagel Nice! These are offline metrics though, have you been able to deploy these into a real/sim robot? Note if your inference speed is much higher, it might also perform better (maybe in general, maybe only on tasks that require finer motion).

English

Lucas@quantbagel·3 Mar

Robot action models shouldn't need 256 vision tokens per frame. Pi0.5 spends 400M parameters on SigLIP just to see. We replaced it with a 4.4M encoder that outputs 5 tokens — and action quality barely changes. 91x smaller. 51x fewer tokens. 7.3x faster inference.

English

355

18.7K

Joan Rodriguez@joanrod_ai·25 Şub

Introducing @QuiverAI, a new AI lab and product company focused on frontier vector design. We’ve raised an $8.3M seed round led by @a16z, with support from amazing angels and investors. Our first model, Arrow-1.0, generates SVGs from images and text. It’s available now in public beta at app.quiver.ai

English

305

293

4.8K

1.3M

Diego Porres@PDillis·25 Şub

@joanrod_ai @QuiverAI @a16z Congrats Joan! It’s been a long trip but I’m certain great things await you and the team!

English

351

Diego Porres@PDillis·20 Şub

@jon_barron You can always mess with some weights of the network for it to fail by the right amount

English

Jon Barron@jon_barron·19 Şub

Unfortunately these gifs were time-gated to 2025 because they're contingent on 1) models being bad at reproducing the input, which got fixed, 2) platforms subsidizing many independent generations, and 3) human willingness to do a repetitive manual task, which is now agent-work.

English

5.6K

Diego Porres@PDillis·18 Şub

@kamalgupta09 @ashishkr9311 do you need video to action if your model can run at 120 FPS?

English

Kamal Gupta@kamalgupta09·18 Şub

The post triggered a lot of 3D vision folks but it is right on money. Had a similar epiphany regarding robot learning ~2 years ago after a long chat with @ashishkr9311. 3D priors may give you short term efficiency gains, but long term going from video to action, and allowing big models to learn their own intermediate representations is the right direction.

Vincent Sitzmann@vincesitzmann

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bitter_le…

English

9.5K

Diego Porres@PDillis·17 Şub

@iScienceLuvr Goldstein?

Deutsch

365

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·17 Şub

Image Generation with a Sphere Encoder a few-step image generation method by mapping images to a spherical latent space, trained with simple reconstruction+consistency losses

Tanishq Mathew Abraham, Ph.D. tweet media

English

175

9.3K

Diego Porres@PDillis·13 Şub

@jordiponsdotme the whole building is wrong, but yes

English

Jordi Pons@jordiponsdotme·12 Şub

the map is wrong

Arnau Font@senyorfont

Seedance 2.0 no té (de moment) cap tipus de censura. Pots fer, si vols, en Jackie Chan lluitant amb en Doraemon a la Sagrada Família.

English

560

Diego Porres@PDillis·13 Şub

@sea_snell lmfao

English

289

Charlie Snell@sea_snell·13 Şub

People who do model merging are like the flat earthers of deep learning

English

300

40.9K

Diego Porres@PDillis·11 Şub

@gabriberton Genuinely think this is one of the best papers and it's underused. The most recent paper, of the few I've seen, that use it is RAP: arxiv.org/abs/2510.04333

English

Gabriele Berton@gabriberton·11 Şub

This would force the features extracted by the model to be domain agnostic, i.e. the feature contain no info about the domain, making the features more robust on the target domain Cool stuff

English

518

Gabriele Berton@gabriberton·11 Şub

A little more info on Domain Adaptation: the task is that you would have a labelled train set of one "source" domain (e.g. daytime images) and an unlabelled set from the test/target domain (e.g. night images). [1/N]

Gabriele Berton@gabriberton

Writing this gave me flashbacks of when CLIP came out. Part of my lab was working on Domain Adaptation, i.e. adapting models to unseen domains. CLIP killed that field CLIP has seen everything, suddenly there was this model with no unseen domain. [1/2]

English

10.5K

Diego Porres@PDillis·9 Şub

@hengcherkeng @rsasaki0109 literally the Matrix plot lol

English

hengcherkeng@hengcherkeng·9 Şub

@rsasaki0109 It might just as well the agent writes the paper, makes rebuttals, polishes and makes final submission. Then the agent books air tickets and attends the conference and writes reports. The human author did nothing except paying for electricity bills

English

798

rsasaki0109@rsasaki0109·9 Şub

Paper2Rebuttal RebuttalAgent: AI-Powered Academic Paper Rebuttal Assistant github.com/AutoLab-SAI-SJ… RebuttalAgent is an AI-powered multi-agent system that helps researchers craft high-quality rebuttals for academic paper reviews. The system analyzes reviewer comments, searches relevant literature, generates rebuttal strategies, and produces formal rebuttal letters, all through an interactive human-in-the-loop workflow. Key Features 📄 Automatic Paper Parsing: Converts PDF papers to structured text using Docling 🔍 Issue Extraction: Breaks down reviewer comments into actionable issues with priority levels 📚 Literature Search: Automatically searches arXiv for relevant supporting papers 💡 Strategy Generation: Creates data-driven rebuttal strategies (not sophistry!) ✍️ Rebuttal Writing: Generates formal, conference-ready rebuttal letters 🔄 Human Feedback Loop: Iteratively refine strategies based on author input

English

306

24.2K

Diego Porres@PDillis·8 Şub

@Birchlabs It's just trying to ragebait you

English

112

Birchlabs@Birchlabs·8 Şub

after hours of debugging, got to the bottom of why training was diverging this whole time I was doing gradient ascent

English

984

42.4K

Diego Porres@PDillis·3 Şub

@Algomancer StyleGAN models did this with the disentangled latent space W. From what I've tested, they reach almost the same family of distribution, but this is still to be finished

English

145

Adam Hibble@Algomancer·2 Şub

Question for my Flow Matching / Diffusion pilled friends. I've been doing this for years but never seen it on my feed. (I havn't actively looked for it, so if you know any reference papers, kinda just seemed obvious). I use it for my diffusion/flow matching prior vaes, but it works fine in rectified flows / mean flow / etc recipes where your focused on reducing the number of function evaluations. Do people ever learn the prior/starting distribution? ie where the noise distribution (prior) is learned rather than fixed to N(0, I). (Quick toy example below from some of my adverserial flow matching experiments so you know what i mean). The intuition being that optimal transport cost depends on the choice of source distribution. A learned prior reduces the total transport distance by better aligning with the data geometry. github.com/Algomancer/Adv…

English

282

26K

Entdecken

@ELLISforEurope @GoogleDeepMind @atulit_gaur @adamtotscomix @AlsikkanTV @leothecurious @phillip_isola @quantbagel