Vaibhav (VB) Srivastav@reach_vb
PaliGemma - Open Vision Model from Google! 💎
> 3B parameter model - SigLiP + Gemma 2B
> Supports images upto 896 x 896 resolution
> Capable of Document understanding, Image detection, visual question answering, captioning and more
> In addition to general purpose checkpoints they also release specialised models - Diagram understanding, science question answering, COCO captions, etc
> Models on the Hub & Integrated with Transformers! 🤗
> Overall 160 checkpoints across JAX, PyTorch (are being released)
Good day for GPU Poors! 🔥 - Thank you Google and Big Vision group!