Ryan Ramos

68 posts

Ryan Ramos banner
Ryan Ramos

Ryan Ramos

@ryan_c_ramos

PhD student @ IsLab, Osaka University

Katılım Kasım 2016
27 Takip Edilen25 Takipçiler
Ryan Ramos
Ryan Ramos@ryan_c_ramos·
Couldn’t be more grateful for this #ICCV2025 collaboration! It turns out that some vision models have actually been encoding images’ processing and acquisition parameters (e.g., JPEG compression settings, camera model) this whole time For more info check out @stojnvla’s thread!
Vladan Stojnić@stojnvla

Have you ever asked yourself how much your favorite vision model knows about image capture parameters (e.g., the amount of JPEG compression, the camera model, etc.)? Furthermore, could these parameters influence its semantic recognition abilities?

English
1
0
0
123
Ryan Ramos
Ryan Ramos@ryan_c_ramos·
Feel free to visit IS1-081 if you’re at MIRU! An extended version was accepted at an ICCV workshop! Here we investigate whether bias in pre-trained CLIPs transfer to downstream tasks when you train models like LLaVA Co-authored w/ @yusuke_hirota , Yuta Nakashima, & Noa Garcia
Ryan Ramos tweet media
English
0
0
0
152
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Who killed non-contrastive image-text pretraining? @AlecRad and @_jongwook_kim with the below Fig2 in CLIP. Who collected the 7 Dragonballs and asked Shenron to resurrect it? Yours truly, in this new paper of ours. Generative captioning is not only competitive, it seems better!
Lucas Beyer (bl16) tweet mediaLucas Beyer (bl16) tweet mediaLucas Beyer (bl16) tweet media
English
18
87
571
212.5K
Ryan Ramos
Ryan Ramos@ryan_c_ramos·
@jxmnop This sounds really interesting! If I'm getting this correctly, the method should not need paired data, right? And the goal's to translate embeddings from enc A to their equivalent representations in enc B's space? But you have embeddings from B, even if they're not paired data?
English
0
0
0
56
dr. jack morris
dr. jack morris@jxmnop·
As an exercise in open science, gonna tweet the research problem I’m stuck on: i want to align two text embedding spaces in an unsupervised way. The motivation is that in my previous vec2text work, we have to know the embedding model and be able to query it. this is fine in today’s world where most people use openAI ada embeddings but when people move on to a better mode, my inversion models won’t work anymore. so i want to take embeddings from an *unknown* embedder and map them somehow to a space i know, like the openAI embedding space, then decode them Sounds hard, right? it definitely is. but my crazy idea is that all text embedding models are learning something very similar, embeddings lie on a low-dimensional manifold, and so given enough samples we should be able to align them. this is supported by some past research on unsupervised bilingual word embedding alignment (which works really well!) and also this fascinating line of research on “relative representations” where representing embeddings by their distances to known anchor points makes embeddings compatible between different spaces So i learned there’s this whole class of problems called “optimal transport” that’s exactly this, it’s the mathematical study of how to find the optimal mapping between two vector spaces. sounds perfect, right? sadly it doesn’t work very well, at least out-of-the-box. Given a thousand paired samples from two different embedding models A and B, the Sinkhorn algorithm can get about 1% accuracy (10x above random). Gromov-Wasserstein which tries to preserve cosine similarity can get a little bit better. If i use embeddings from two models from the same family i can get 20%. I tried using relative representations. this requires 100 or so paired anchor points from both embedders which is also a bottleneck. but using 100-dim relative representations sinkhorn gets 70% accuracy with no hparam tuning which is pretty good. but no one has figured out how to find anchor points without any supervision yet (although I think it’s probably possible) Also a supervised linear mapping between the two embedding spaces works super well, can get 90%+ accuracy, and i can invert the remapped embeddings with pretty good BLEU score but that’s cheating too. (also the true mapping is certainly nonlinear) both these algorithms again this require paired samples which is unrealistic. I want to be able to invert a random database of text embeddings without any paired samples. With enough entries i think it should be possible, just like we can infer an arbitrary substitution if we have enough encrypted data. anyway thats my progress so far! I am now extremely stuck. if you have any ideas please message me or reply to the thread
English
51
22
313
71.8K
Ryan Ramos
Ryan Ramos@ryan_c_ramos·
@CFGeek Interesting, admittedly I'm not up-to-date on the legislative side of things outside of what I pick up around here, maybe I'll check these out too. Didn't even realize LeCun was there either. Thanks!
English
0
0
1
24
Charles Foster
Charles Foster@CFGeek·
Just started watching this on a whim, but I found the opening statements (all of them, not just that of LeCun) quite interesting, enough that I'll definitely continue watching the rest! youtube.com/live/WgNBDjNY0…
YouTube video
YouTube
English
1
0
6
316
Wenhu Chen
Wenhu Chen@WenhuChen·
Is it a common problem that huggingface transformer models (LLaMA) will generate different outputs with different batch sizes? Even after the attention mask is being applied to the padding tokens. I have searched around and haven't found any solution.
English
12
11
106
45.4K
Ryan Ramos
Ryan Ramos@ryan_c_ramos·
@YiTayML Sorry, I'm sure this has been discussed before and I just can't remember where, but what's the explanation for 3B enc-dec models being equivalent to 1B decoder-only models again? Is this in terms of training compute? Inference cost?
English
0
0
0
287
Yi Tay
Yi Tay@YiTayML·
Hot take 🔥: Lots of buzz these days about new foundation open-source models but what if I told you there have been no real advance since 2019's T5 models 😀 Take a look at this table from this new InstructEval paper: arxiv.org/abs/2306.04757. Some thoughts/observations: 1. Flan-T5 beats everything, including Alpaca (LLama-based), Flan-Alpaca, Mosiac-Chat/MPT, Dolly. 2. If you arrange this table in terms of "compute-match", encoder-decoder should have been in a different (lower) weight class. Basically, Flan-T5 3B is like a 1B+ decoder and Flan-UL2 is more like a 8B+ model. With this perspective, the gap is so dramatically huge that it's not even funny. 3. Flan-UL2 basically wrecks Alpaca-Lora 30B despite being so much smaller and effectively 4x less compute. 4. This is not entirely about Flan series models - it's more about the base models! The point is that the base T5 models are already ridiculously strong. 1 trillion tokens, just blatantly repeating C4 to heart's content. There's also mT5 and uMT5 which are strongly multilingual and ridiculously good. The base models are not long context, but Flan mitigates this. 5. The weakness is that T5/UL2 models are not diverse and are only C4 trained that means they probably don't do well at code/math whatever (code eval 0 score below lol). But its scary how strong a C4-only baseline that we had since 2019 is performing. 6. If you look at fastchat-t5 on the chatbot arena by @lmsysorg the 3B fastchat model does as well as MPT-7B et al despite being only 3B (and if you paid attention till now you know that's a 1B+ equivalent decoder-only model). That's really insane if you think of a 1B+ model on a leaderboard of all these new "OSS LLM advances". 7. It's likely at compute match, T5 >> Llama. The only problem is that we don't have T5 models at 30B and 65B. You're welcome.
Yi Tay tweet media
English
46
194
1.1K
502.9K
younes
younes@yb2698·
A huge day for open source! 🔥 You can now load models from @huggingface in 4bit precision using load_in_4bit and bitsandbytes library, with no performance degradation. Announcement notes here: huggingface.co/blog/4bit-tran… Useful resources below
younes tweet media
Tim Dettmers@Tim_Dettmers

QLoRA: 4-bit finetuning of LLMs is here! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark: Paper: arxiv.org/abs/2305.14314 Code+Demo: github.com/artidoro/qlora Samples: colab.research.google.com/drive/1kK6xasH… Colab: colab.research.google.com/drive/17XEqL1J…

English
5
162
722
142.2K
Ryan Ramos
Ryan Ramos@ryan_c_ramos·
@younesbelkada @ArtidoroPagnoni @GuggerSylvain @sourab_m First of all, congrats and thanks to everyone involved! Low precision inference/training is really important for people like me. On the QLoRA GitHub page it says that "fast 4-bit inference" is still a WIP; I assume that's also the case with its HuggingFace integration?
English
1
0
1
52
Ryan Ramos retweetledi
meow🚭
meow🚭@meowbooksj·
I just corrected a typo inside the image using twitter blue. Money well spent.
meow🚭 tweet media
English
3
10
173
6.5K
Ryan Ramos retweetledi
meow🚭
meow🚭@meowbooksj·
Mock @ESYudkowsky and he will be coming for your loss curves. Had to kill mine this morning.
meow🚭 tweet media
English
3
5
42
10.4K
Ryan Ramos
Ryan Ramos@ryan_c_ramos·
Perhaps one day GLOM will take off and these verification methods will become obsolete
Ryan Ramos tweet mediaRyan Ramos tweet media
English
0
0
1
52
Ryan Ramos
Ryan Ramos@ryan_c_ramos·
Understandable ofc
Ryan Ramos tweet media
English
0
0
0
37
Ryan Ramos retweetledi
meow🚭
meow🚭@meowbooksj·
the best part of the Moratorium was when he said 'IT'S MORATORIUM TIME' and moratoriumed all over those guys.
meow🚭 tweet media
English
1
3
14
1.4K
Ryan Ramos retweetledi
François Fleuret
François Fleuret@francoisfleuret·
How comes diffusion models for language modeling are nowhere to see?
English
15
1
49
39.5K
Ryan Ramos retweetledi
Mathieu
Mathieu@miniapeur·
Mathieu tweet media
ZXX
4
46
395
25.6K