

Carlos Esteves
96 posts

@_machc
Research Scientist @GoogleAI #GoogleResearch. PhD in CS @GRASPlab, @Penn. Interested in computer vision and machine learning.



Our new paper, "Spectral Image Tokenizer", is on arXiv! We train a tokenizer on DWT coefficients that enables autoregressive coarse-to-fine image generation, w/ applications to multiscale text-to-image, and text-guided editing. w/ @kiamada, @msuhail153 arxiv.org/abs/2412.09607

Don't miss the upcoming session on "Spectral Image Tokenizer" presented by @_machc, Research Scientist at @Google, on Wednesday January 22nd! Huge thanks to @AhmadMustafaAn1 for coordinating this event! 💫 Learn more: cohere.com/events/cohere-…













Applying computer vision models designed for planar images to data projected on spherical surfaces is challenging. Here we present an open-source library in JAX to solve the challenges of rotation and regular sampling for state-of-the-art performance → goo.gle/46z3vD7

Applying computer vision models designed for planar images to data projected on spherical surfaces is challenging. Here we present an open-source library in JAX to solve the challenges of rotation and regular sampling for state-of-the-art performance → goo.gle/46z3vD7


ASIC: Aligning Sparse in-the-wild Image Collections abs: arxiv.org/abs/2303.16201 project page: kampta.github.io/asic/


Scaling Spherical CNNs paper page: huggingface.co/papers/2306.05… Spherical CNNs generalize CNNs to functions on the sphere, by using spherical convolutions as the main linear operation. The most accurate and efficient way to compute spherical convolutions is in the spectral domain (via the convolution theorem), which is still costlier than the usual planar convolutions. For this reason, applications of spherical CNNs have so far been limited to small problems that can be approached with low model capacity. In this work, we show how spherical CNNs can be scaled for much larger problems. To achieve this, we make critical improvements including novel variants of common model components, an implementation of core operations to exploit hardware accelerator characteristics, and application-specific input representations that exploit the properties of our model. Experiments show our larger spherical CNNs reach state-of-the-art on several targets of the QM9 molecular benchmark, which was previously dominated by equivariant graph neural networks, and achieve competitive performance on multiple weather forecasting tasks.


