Ryan Ramos

68 posts

Ryan Ramos

@ryan_c_ramos

PhD student @ IsLab, Osaka University

Katılım Kasım 2016

27 Takip Edilen25 Takipçiler

Ryan Ramos@ryan_c_ramos·22 Eki

共著者: @stojnvla (co-first author), @g_kordo, 中島悠太, Giorgos Tolias, @noagarciad 午後にぜひposter 207に来てください!

CVL-Osaka: Computer Vision League Osaka@cvl_osaka

@ICCVConference @dahlian0 [Poster session 4 #207 *Highlight*] @ryan_c_ramos+, Processing and acquisition traces in visual encoders: What does CLIP know about your camera? (2/2)

日本語

660

Ryan Ramos@ryan_c_ramos·18 Ağu

x.com/stojnvla/statu…

Vladan Stojnić@stojnvla

Many thanks to the amazing collaborators: @ryan_c_ramos , @g_kordo , Yuta Nakashima, Giorgos Tolias , @noagarciad

ZXX

Ryan Ramos@ryan_c_ramos·18 Ağu

Couldn’t be more grateful for this #ICCV2025 collaboration! It turns out that some vision models have actually been encoding images’ processing and acquisition parameters (e.g., JPEG compression settings, camera model) this whole time For more info check out @stojnvla’s thread!

Vladan Stojnić@stojnvla

Have you ever asked yourself how much your favorite vision model knows about image capture parameters (e.g., the amount of JPEG compression, the camera model, etc.)? Furthermore, could these parameters influence its semantic recognition abilities?

English

123

Ryan Ramos@ryan_c_ramos·30 Tem

Feel free to visit IS1-081 if you’re at MIRU! An extended version was accepted at an ICCV workshop! Here we investigate whether bias in pre-trained CLIPs transfer to downstream tasks when you train models like LLaVA Co-authored w/ @yusuke_hirota , Yuta Nakashima, & Noa Garcia

English

152

Ryan Ramos@ryan_c_ramos·31 Tem

@giffmana @AlecRad @_jongwook_kim @mtschannen @XiaohuaZhai @AndreasPSteiner @mechcoder @neilhoulsby @borisdayma Thank you so much for letting me know! Again, very cool work!

English

Lucas Beyer (bl16)@giffmana·31 Tem

@ryan_c_ramos @AlecRad @_jongwook_kim @mtschannen @XiaohuaZhai @AndreasPSteiner @mechcoder @neilhoulsby Unfortunately, we cannot release the checkpoints. But @borisdayma has a repro! Alternatively, consider PaliGemma like a V3 of this, since we don't freeze the image tower.

English

122

Lucas Beyer (bl16)@giffmana·17 Haz

Who killed non-contrastive image-text pretraining? @AlecRad and @_jongwook_kim with the below Fig2 in CLIP. Who collected the 7 Dragonballs and asked Shenron to resurrect it? Yours truly, in this new paper of ours. Generative captioning is not only competitive, it seems better!

English

571

212.5K

Ryan Ramos@ryan_c_ramos·18 Oca

@jxmnop This sounds really interesting! If I'm getting this correctly, the method should not need paired data, right? And the goal's to translate embeddings from enc A to their equivalent representations in enc B's space? But you have embeddings from B, even if they're not paired data?

English

dr. jack morris@jxmnop·17 Oca

As an exercise in open science, gonna tweet the research problem I’m stuck on: i want to align two text embedding spaces in an unsupervised way. The motivation is that in my previous vec2text work, we have to know the embedding model and be able to query it. this is fine in today’s world where most people use openAI ada embeddings but when people move on to a better mode, my inversion models won’t work anymore. so i want to take embeddings from an *unknown* embedder and map them somehow to a space i know, like the openAI embedding space, then decode them Sounds hard, right? it definitely is. but my crazy idea is that all text embedding models are learning something very similar, embeddings lie on a low-dimensional manifold, and so given enough samples we should be able to align them. this is supported by some past research on unsupervised bilingual word embedding alignment (which works really well!) and also this fascinating line of research on “relative representations” where representing embeddings by their distances to known anchor points makes embeddings compatible between different spaces So i learned there’s this whole class of problems called “optimal transport” that’s exactly this, it’s the mathematical study of how to find the optimal mapping between two vector spaces. sounds perfect, right? sadly it doesn’t work very well, at least out-of-the-box. Given a thousand paired samples from two different embedding models A and B, the Sinkhorn algorithm can get about 1% accuracy (10x above random). Gromov-Wasserstein which tries to preserve cosine similarity can get a little bit better. If i use embeddings from two models from the same family i can get 20%. I tried using relative representations. this requires 100 or so paired anchor points from both embedders which is also a bottleneck. but using 100-dim relative representations sinkhorn gets 70% accuracy with no hparam tuning which is pretty good. but no one has figured out how to find anchor points without any supervision yet (although I think it’s probably possible) Also a supervised linear mapping between the two embedding spaces works super well, can get 90%+ accuracy, and i can invert the remapped embeddings with pretty good BLEU score but that’s cheating too. (also the true mapping is certainly nonlinear) both these algorithms again this require paired samples which is unrealistic. I want to be able to invert a random database of text embeddings without any paired samples. With enough entries i think it should be possible, just like we can infer an arbitrary substitution if we have enough encrypted data. anyway thats my progress so far! I am now extremely stuck. if you have any ideas please message me or reply to the thread

English

313

71.8K

Ryan Ramos@ryan_c_ramos·22 Eyl

@CFGeek Interesting, admittedly I'm not up-to-date on the legislative side of things outside of what I pick up around here, maybe I'll check these out too. Didn't even realize LeCun was there either. Thanks!

English

Charles Foster@CFGeek·22 Eyl

@ryan_c_ramos In a good way!

English

Charles Foster@CFGeek·21 Eyl

Just started watching this on a whim, but I found the opening statements (all of them, not just that of LeCun) quite interesting, enough that I'll definitely continue watching the rest! youtube.com/live/WgNBDjNY0…

YouTube

English

316

Ryan Ramos@ryan_c_ramos·3 Eyl

@WenhuChen I think I had a very similar problem earlier this year, and honestly can't even remember if I even properly addressed it. Not sure if this is helpful github.com/tloen/alpaca-l…

English

131

Wenhu Chen@WenhuChen·2 Eyl

Is it a common problem that huggingface transformer models (LLaMA) will generate different outputs with different batch sizes? Even after the attention mask is being applied to the padding tokens. I have searched around and haven't found any solution.

English

106

45.4K

Ryan Ramos@ryan_c_ramos·13 Haz

@YiTayML Sorry, I'm sure this has been discussed before and I just can't remember where, but what's the explanation for 3B enc-dec models being equivalent to 1B decoder-only models again? Is this in terms of training compute? Inference cost?

English

287

Yi Tay@YiTayML·12 Haz

Hot take 🔥: Lots of buzz these days about new foundation open-source models but what if I told you there have been no real advance since 2019's T5 models 😀 Take a look at this table from this new InstructEval paper: arxiv.org/abs/2306.04757. Some thoughts/observations: 1. Flan-T5 beats everything, including Alpaca (LLama-based), Flan-Alpaca, Mosiac-Chat/MPT, Dolly. 2. If you arrange this table in terms of "compute-match", encoder-decoder should have been in a different (lower) weight class. Basically, Flan-T5 3B is like a 1B+ decoder and Flan-UL2 is more like a 8B+ model. With this perspective, the gap is so dramatically huge that it's not even funny. 3. Flan-UL2 basically wrecks Alpaca-Lora 30B despite being so much smaller and effectively 4x less compute. 4. This is not entirely about Flan series models - it's more about the base models! The point is that the base T5 models are already ridiculously strong. 1 trillion tokens, just blatantly repeating C4 to heart's content. There's also mT5 and uMT5 which are strongly multilingual and ridiculously good. The base models are not long context, but Flan mitigates this. 5. The weakness is that T5/UL2 models are not diverse and are only C4 trained that means they probably don't do well at code/math whatever (code eval 0 score below lol). But its scary how strong a C4-only baseline that we had since 2019 is performing. 6. If you look at fastchat-t5 on the chatbot arena by @lmsysorg the 3B fastchat model does as well as MPT-7B et al despite being only 3B (and if you paid attention till now you know that's a 1B+ equivalent decoder-only model). That's really insane if you think of a 1B+ model on a leaderboard of all these new "OSS LLM advances". 7. It's likely at compute match, T5 >> Llama. The only problem is that we don't have T5 models at 30B and 65B. You're welcome.

English

194

1.1K

502.9K

Ryan Ramos@ryan_c_ramos·25 May

@younesbelkada @ArtidoroPagnoni @GuggerSylvain @sourab_m Thanks for letting me know!

English

younes@yb2698·25 May

@ryan_c_ramos @ArtidoroPagnoni @GuggerSylvain @sourab_m Thanks! Yes, as soon as it will be integrated on bitsandbytes side it will be available on HF transformers

English

younes@yb2698·24 May

A huge day for open source! 🔥 You can now load models from @huggingface in 4bit precision using load_in_4bit and bitsandbytes library, with no performance degradation. Announcement notes here: huggingface.co/blog/4bit-tran… Useful resources below

Tim Dettmers@Tim_Dettmers

QLoRA: 4-bit finetuning of LLMs is here! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark: Paper: arxiv.org/abs/2305.14314 Code+Demo: github.com/artidoro/qlora Samples: colab.research.google.com/drive/1kK6xasH… Colab: colab.research.google.com/drive/17XEqL1J…

English

162

722

142.2K

Ryan Ramos@ryan_c_ramos·25 May

@younesbelkada @ArtidoroPagnoni @GuggerSylvain @sourab_m First of all, congrats and thanks to everyone involved! Low precision inference/training is really important for people like me. On the QLoRA GitHub page it says that "fast 4-bit inference" is still a WIP; I assume that's also the case with its HuggingFace integration?

English

younes@yb2698·24 May

On HF side, we all worked together with @TimDettmers @ArtidoroPagnoni, @GuggerSylvain and @sourab_m to smoothly integrate the method into the HF transformers, PEFT and accelerate stack.

English

1.1K

Ryan Ramos retweetledi