André Susano Pinto

25 posts

André Susano Pinto banner
André Susano Pinto

André Susano Pinto

@ASusanoPinto

Machine learning research @GoogleAI, Opinions mine.

Zurich, Switzerland Katılım Temmuz 2018
101 Takip Edilen590 Takipçiler
André Susano Pinto retweetledi
Michael Tschannen
Michael Tschannen@mtschannen·
Check out our detailed report about *Jet* 🌊 - a simple, transformer-based normalizing flow architecture without bells and whistles. Jet is an important part of JetFormer's engine ⚙️ As a standalone model it is very tame and behaves predictably (e.g. when scaling it up).
Alexander Kolesnikov@__kolesnikov__

With some delay, JetFormer's *prequel* paper is finally out on arXiv: a radically simple ViT-based normalizing flow (NF) model that achieves SOTA results in its class. Jet is one of the key components of JetFormer, deserving a standalone report. Let's unpack: 🧵⬇️

English
0
8
32
4K
André Susano Pinto
André Susano Pinto@ASusanoPinto·
Making new simple things requires attention to detail. From numeric precision and unexpected bugs deep in the stack. But now there is a precedent which includes paper, numbers and code. Hope it helps people go hammer some nails🔨
English
0
0
2
166
André Susano Pinto retweetledi
merve
merve@mervenoyann·
Welcome PaliGemma 2! 🤗 Google released PaliGemma 2, best vision language model family that comes in various sizes: 3B, 10B, 28B, based on Gemma 2 and SigLIP, comes with transformers support day-0 🎁 Saying this model is amazing would be an understatement, keep reading ✨
merve tweet media
English
28
250
1.7K
167.1K
André Susano Pinto retweetledi
Andreas Steiner
Andreas Steiner@AndreasPSteiner·
🚀🚀PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes. 1/7
Andreas Steiner tweet media
English
4
52
260
61.9K
André Susano Pinto
André Susano Pinto@ASusanoPinto·
@YugeTen @__kolesnikov__ We already knew we would like it. But we didn't know how :) The NF comes with two properties: invertible and computable logdet. together they don't allow to cheat to map all latents to a trivial point and then obtain a perfect loss on the AR to model that trivial output.
English
1
0
2
122
Yuge Shi (Jimmy)
Yuge Shi (Jimmy)@YugeTen·
🫨 Cool work sneaking in NF to unlock end-to-end training! In 2022 I interned with @ASusanoPinto and @__kolesnikov__ and I kept asking "BUT WHY DO WE HAVE TO TRAIN A VQVAE FIRST" and they were both like "CHILD YOU MUST LEARN THIS IS THE WAY" -- I learned. I guess they didn't 🤔
Alexander Kolesnikov@__kolesnikov__

I always dreamed of a model that simultaneously 1. optimizes NLL of raw pixel data, 2. generates competitive high-res. natural images, 3. is practical. But it seemed too good to be true. Until today! Our new JetFormer model (arxiv.org/abs/2411.19722) ticks on all of these. 🧵

English
2
1
22
2.9K
André Susano Pinto
André Susano Pinto@ASusanoPinto·
Did you try to get an auto-regressive transformer to operate in a continuous latent space which is not fixed ahead of time but learned end to end from scratch? Enter JetFormer: arxiv.org/abs/2411.19722 -- joint work in a dream team: @mtschannen and @__kolesnikov__
Michael Tschannen@mtschannen

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)? We have been pondering this during summer and developed a new model: JetFormer 🌊🤖 arxiv.org/abs/2411.19722 A thread 👇 1/

English
1
3
37
4.8K
André Susano Pinto
André Susano Pinto@ASusanoPinto·
Feels great to start adding diversity to the available pre-trained visual representations. Especially when it has considerable impact for problems with a smaller number of examples available or hard to collect.
Maxim Neumann@neu_maxim

We've looked into representation learning for #RemoteSensing with different datasets and fine-tuning using in-domain data. See paper with datasets and models included 🔋: arxiv.org/abs/1911.06721 with @ASusanoPinto, @XiaohuaZhai and @neilhoulsby.

English
0
0
4
0
André Susano Pinto retweetledi
Google AI
Google AI@GoogleAI·
We’re pleased to release the Visual Task Adaptation Benchmark (VTAB), a diverse, realistic, and challenging protocol to measure progress towards universal visual representations. Learn all about it below. goo.gle/2Noutb9
English
3
120
332
0
André Susano Pinto retweetledi
TensorFlow
TensorFlow@TensorFlow·
A new, multilingual version of the Universal Sentence Encoder (USE) model is now available on #TFHub! Check it out here → bit.ly/2J7ZJuX
TensorFlow tweet media
English
0
57
199
0