harpreet
6.6K posts

harpreet
@DataScienceHarp
Hacker-in-residence @voxel51| VLMs, VLAs, robotics, multimodal datasets
I ship daily انضم Nisan 2020
1.6K يتبع7.4K المتابعون

@mervenoyann just when i thought it was safe to take the afternoon off to get a haircut...
English

AI2 released new family of vision LMs for pointing (SOTA!) 🔥
> MolmoPoint-8B (general use)
> MolmoPoint-GUI-8B (graphical computer use)
> MolmoPoint-Vid-4B (counting/tracking in videos)
also with their datasets 🥵
Ai2@allen_ai
Grounding lets vision-language models do more than describe—they can point to where a robot should grasp, which button to click, or which object to track across video frames. Today we're releasing MolmoPoint, a better way for models to point. 🧵
English
harpreet أُعيد تغريده

Roadmaps vs. Splinter Efforts: The founder's greatest tension. 🛣️💥
*My take:* If your team isn't challenging your product plan, you've hired the wrong team. At @voxel51, a "splinter" effort by @dan_gural became our Physical AI Workbench, evolving our identity to meet the 3D reality of our customers.
The TAF Series Part 9 is live: @jasoncorso/physical-ai-splinter-3f9195b3cfeb" target="_blank" rel="nofollow noopener">medium.com/@jasoncorso/ph…

English

I parsed this into FiftyOne format a while back, for ease of exploration and evaluation:
huggingface.co/datasets/Voxel…
merve@mervenoyann
ScreenSpot-Pro, the GUI computer use benchmark is now on @huggingface 🏆 just added Qwen3.5 it takes 5th place, with specialist Holo2 family takes top ranks whoever builds next GUI model based on Qwen3.5 can top the leaderboard? 🔥
English

everyone is hype about gtc, but the 3d vision conference is this week and i'm much hype for that
i made a repo and dataset to explore the papers
checkout the repo here: github.com/harpreetsahota…
here's a dataset that you can use to explore the papers: huggingface.co/datasets/Voxel…
English
harpreet أُعيد تغريده

Integrate Google #Gemini Vision directly into FiftyOne to run OCR, generate synthetic data, detect edge cases, and improve dataset quality - hubs.ly/Q045qjyP0
* Run OCR on image datasets
* Generate synthetic data to balance classes
* Detect edge cases automatically
* Create and edit images to fill dataset gaps
Perfect for autonomous driving, healthcare, and any vision project where dataset quality matters.
#ComputerVision #GoogleGemini #Gemini #FiftyOne #AIML #MLOps #AI #artificialintelligence
English

@JustinLin610 A heartfelt thank you and best wishes for the next move!
English
harpreet أُعيد تغريده

Not onboarding your agent is on you.
@richmondalake, Director of AI Developer Experience at @Oracle, joins @sjmaple to make the case that most agent failures come down to one thing: memory. Not the model, not the infrastructure. Memory.
On the docket:
• why skills are just SOPs your organisation already has written down
• the job title that is replacing prompt engineers
• file systems vs databases for agent memory (and why one gets you hacked)
• the memory trick that makes agents feel actually intelligent
• why the agent loop and training loop are about to become one
The developers who figure this out first are going to be very hard to compete with.
(00:00) Trailer
(01:03) Introduction and guest welcome
(04:02) Defining agent memory
(05:14) From prompt to context engineering
(09:18) Skills as SOPs for agents
(13:25) What agents need to succeed
(16:06) Getting agents to the right data
(20:18) Security and data privacy
(26:10) Context lifecycle and forgetting
(33:13) Multi-team context sharing
(35:25) File system vs database storage
(38:33) Future of context engineering
English

@prithivMLmods @huggingface hey just curious: is this a fine-tuned version of the model for this task?
English

Qwen3-VL-Video-Grounding Demo. Perform point tracking, text-guided detection, and video question answering, all powered by the Qwen3-VL-4B vision-language model with real-time bounding box detection and cross-frame object matching. 🤗 @huggingface Demo in 🧵
English

@llm_wizard Hmmmmm feels like some ulterior motives behind it all
English

PR at Claude woke up this morning and chose "based".
claude.com/contact-sales/…
English
harpreet أُعيد تغريده

Join @DataScienceHarp on Feb 26 for a virtual workshop to learn how to use @Facebook's Action100M dataset and FiftyOne to build an end-to-end workflow - hubs.ly/Q044KWqn0
GIF
English

I'm planning to treat myself with a DGX Spark on my birthday in few months, are we convinced it's good for its price
cc @pcuenq @multimodalart
English
harpreet أُعيد تغريده

We have 5 papers accepted at @CVPR 2026 across my teams at @UMich @UMRobotics @UMichECE @michigan_AI and @Voxel51. Coupling these with a handful of workshops I'm participating in, descending on the great city of Denver in June is going to be great fun!
These papers range from 4D Vector indexing and Mistake Attribution to Spline-controlled 3D character generation. The list is below. Over the coming weeks I'll be posting more details about them as camera-readies, project pages, and open-source code is available!
1️⃣ R4: Retrieval-Augmented Reasoning for Vision-Language Models in 4D Spatio-Temporal Space; @SohnTin73632 Tin Stribor Sohn, Maximilian Dillitzer, Jason J Corso, Eric Sax
2️⃣ Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos; @YayuanLi Yayuan Li, Aadit Jain, Filippos Bellos, Jason J Corso
3️⃣ BiMotion: B-spline Motion for Text-guided Dynamic 3D Character Generation; Miaowei(Michael) Wang, Qingxuan Yan, Zhi Cao, Yayuan Li, @oisinmchugh Oisin Mac Aodha, Jason J Corso, @amirvaxman_dgp Amir Vaxman
4️⃣ Bridging Facial Understanding and Animation via Language Models; Luchuan Song, Pinxin Liu, Haiyang Liu, Zhenchao Jin, Yolo Yunlong Tang, Zichong Xu, Susan Liang, Jing Bi, Jason J Corso, @ChenliangXu Chenliang Xu
5️⃣ When to Think and When to Look: Uncertainty-Guided Lookback; Jing Bi, Filippos Bellos, JunJia Guo, Yayuan Li, Chao Huang, Yolo Yunlong Tang, Luchuan Song, Susan Liang, Zhongfei Zhang, Jason J Corso, Chenliang Xu
🧲 Follow me here on X/Twitter to stay up to date on these coming posts!

English
harpreet أُعيد تغريده

Join @DataScienceHarp on Feb 26 for a virtual workshop to learn how to use @Facebook's Action100M dataset and FiftyOne to build an end-to-end workflow - hubs.ly/Q0440yBt0
#computervision #ai #artificialintelligence #machinevision #machinelearning #datascience #opensource
GIF
English

