Jacob Yeung

19 posts

Jacob Yeung

Jacob Yeung

@JacobYeung

Ph.D. Student @mldcmu, @cmuneurosci | Previously undergrad @berkeley_ai.

Katılım Mayıs 2013
170 Takip Edilen48 Takipçiler
Sabitlenmiş Tweet
Jacob Yeung
Jacob Yeung@JacobYeung·
📅 Where to catch BrainNRDS at @CVPR 🗣️ Oral — June 13, 13:15-13:30  Session 2B: Human Motion, ExHall A2 🖼️ Poster #220 — June 13, 16:00-18:00  ExHall D Stop by to chat about decoding motion from fMRI and video generation! 🧠🎥 #CVPR2025
Jacob Yeung@JacobYeung

1/6 🚀 Excited to share that BrainNRDS has been accepted as an oral at #CVPR2025! We decode motion from fMRI activity and use it to generate realistic reconstructions of videos people watched, outperforming strong existing baselines like MindVideo and Stable Video Diffusion.🧠🎥

English
0
2
5
717
Jacob Yeung retweetledi
Zhiqiu Lin
Zhiqiu Lin@ZhiqiuLin·
Before AI can generate professional videos, it needs to see like a professional. We spent a year with 100+ content creators teaching AI to describe video like a filmmaker would. Introducing CHAI: Critique-based Human-AI Oversight for Building a Precise Video Language [CVPR'26 Highlight, Top 3%]. Try prompting a video generator for a dolly zoom, dutch angle, point of view, or camera roll. Most fall back to the same bland defaults: a push-in, a level shot, a third-person view. Why? These techniques require a language of cinema that current models rarely speak. We built that language: 1️⃣ Precise specification: 5-aspect structured captions co-designed with professional cinematographers covering subject, scene, motion, spatial, and camera dynamics 2️⃣ Scalable oversight: LLMs draft captions, humans critique what's wrong and how to fix it 3️⃣ Post-training recipes: Qwen3-VL-8B surpasses Gemini-3.1 and GPT-5 4️⃣ Video generation: fine-tuned Wan follows 400-word cinematic prompts with precise control Here's how each works 🧵 Work led by CMU and Harvard with @chancharikm, @du_yilun, and @RamananDeva. 📄 Paper: huggingface.co/papers/2604.21… 🌐 Site: linzhiqiu.github.io/papers/chai/
English
25
60
364
31.9K
Jacob Yeung retweetledi
Manu Gaur
Manu Gaur@gaur_manu·
Pretrained ViTs like DINOv2 or CLIP are great, but they produce fixed, generic representations that encode the most salient visual concepts (e.g., "cat"). In human vision, prior priming with language changes how people parse an image. We believe visual encoders should do the same 🚨 Introducing Steerable Visual Representations, a new family of visual features you can steer with text towards specific visual concepts.
Manu Gaur tweet media
English
13
136
900
147.4K
Jacob Yeung retweetledi
Gabriel Sarch
Gabriel Sarch@GabrielSarch·
Introducing Vero, the strongest fully open RL recipe for training next-generation visual reasoners. From charts to spatial to open-ended tasks, Vero sets a new bar. • sota 8B VLM across 30 benchmarks • +4.4 avg over four base models (30 evals) • beats prior RL datasets 🧵👇
Gabriel Sarch tweet media
English
3
59
300
60.8K
Jacob Yeung retweetledi
Khurram Yamin
Khurram Yamin@KhurramYam·
In an exciting collaboration with MSR, we ask a simple question: do LLMs actually behave like rational agents? We test whether an LLM’s stated elicited beliefs can be treated as its true beliefs by checking decision-theoretic coherence. arxiv.org/abs/2602.06286
English
1
5
21
4.5K
Jacob Yeung retweetledi
Jay Karhade
Jay Karhade@JayKarhade·
Introducing Any4D, a unified transformer for fully feed-forward, dense, metric-scale 4D reconstruction from flexible inputs! Any4D regresses per-pixel motion + geometry across frames in one pass — 15× faster, 2–3× more accurate reconstructions ⚡📈 Details + code below 👇 Exciting collab with @Nik__V__ @YuchenZhan54250 Tanisha Gupta @akashshrm02 @smash0190 @RamananDeva
English
6
45
207
47.5K
Jacob Yeung retweetledi
Nikhil Keetha
Nikhil Keetha@Nik__V__·
Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀 One universal model enables SoTA for: 🔥 Mono Depth Estimation 🔥 Multi-View SfM 🔥 Multi-View Stereo 🔥 Depth Completion 🔥 Registration … and many more possibilities! – plus everything is metric 🎯 We release code for data processing, training, benchmarking & ablations – everything Apache 2.0! Details & Links 👇
English
30
132
741
121.4K
Jacob Yeung retweetledi
Jennifer Hsia
Jennifer Hsia@jen_hsia·
1/6 Retrieval is supposed to improve generation in RAG systems. But in practice, adding more documents can hurt performance, even when relevant ones are retrieved. We introduce RAGGED, a framework to measure and diagnose when retrieval helps and when it hurts.
Jennifer Hsia tweet media
English
1
23
104
10.2K
Jacob Yeung retweetledi
Elliott / Shangzhe Wu
Elliott / Shangzhe Wu@elliottszwu·
This was a really fun and exciting workshop #CVPR2025! Huge thanks to all the speakers, organizers and reviewers @CVPR! We hope to be able to release the video recordings soon!
Elliott / Shangzhe Wu tweet mediaElliott / Shangzhe Wu tweet mediaElliott / Shangzhe Wu tweet mediaElliott / Shangzhe Wu tweet media
Elliott / Shangzhe Wu@elliottszwu

Join us for the 4D Vision Workshop @CVPR on June 11 starting at 9:20am! We'll have an incredible lineup of speakers discussing the frontier of 3D computer vision techniques for dynamic world modeling across spatial AI, robotics, astrophysics, and more. 4dvisionworkshop.github.io

English
0
4
47
8K
Jacob Yeung
Jacob Yeung@JacobYeung·
5/6 We also show that dynamic information isn’t just useful for generation. It is key to understanding brain activity as well. We observe that video models predict brain responses to dynamic scenes better than image models, especially in visual and somatosensory cortices.
Jacob Yeung tweet mediaJacob Yeung tweet media
English
1
2
4
245
Jacob Yeung
Jacob Yeung@JacobYeung·
1/6 🚀 Excited to share that BrainNRDS has been accepted as an oral at #CVPR2025! We decode motion from fMRI activity and use it to generate realistic reconstructions of videos people watched, outperforming strong existing baselines like MindVideo and Stable Video Diffusion.🧠🎥
English
2
12
36
6.8K
Jacob Yeung retweetledi
Gabriel Sarch
Gabriel Sarch@GabrielSarch·
How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵
English
12
58
457
76.5K