Sabitlenmiş Tweet
Prithiv Sakthi
7.4K posts

Prithiv Sakthi
@prithivMLmods
Computer Vision • Multimodal AI • @huggingface Fellow ML🤗 • Computational Intelligence • Diffusion-Driven Adapters • https://t.co/CZfzd6KVRA
India Katılım Ekim 2022
765 Takip Edilen524 Takipçiler

Map-Anything v1 (Universal Feed-Forward Metric 3D Reconstruction) demo is now available on Hugging Face Spaces. Built with @Gradio and integrated with @rerundotio , it performs multi-image and video-based 3D reconstruction, depth, normal map, and interactive measurements.
English
Prithiv Sakthi retweetledi

@mervenoyann They also have a demo for that.
huggingface.co/spaces/allenai…
English

@mervenoyann I’m testing an general-use SOTA for video point tracking.
English

AI2 released new family of vision LMs for pointing (SOTA!) 🔥
> MolmoPoint-8B (general use)
> MolmoPoint-GUI-8B (graphical computer use)
> MolmoPoint-Vid-4B (counting/tracking in videos)
also with their datasets 🥵
Ai2@allen_ai
Grounding lets vision-language models do more than describe—they can point to where a robot should grasp, which button to click, or which object to track across video frames. Today we're releasing MolmoPoint, a better way for models to point. 🧵
English
Prithiv Sakthi retweetledi

VLMs already have visual tokens. Letting them point by selecting those tokens turns out to be simpler, faster, & better.
🤖 Models: huggingface.co/collections/al…
📦 Data: huggingface.co/collections/al…
💻 Code: github.com/allenai/molmo2
📖 Blog: allenai.org/blog/molmopoint
English
Prithiv Sakthi retweetledi

new 1T+ parameter model from @XiaomiMiMo, support 1M context length thanks to 7:1 hybrid sliding window attention!!

English
Prithiv Sakthi retweetledi

Introducing the Paper Pages skill!
Simply paste this SKILL.md, so your coding agent knows how to work with @huggingface papers
Ask it to summarize papers, search papers, or list linked models or datasets
English




