Jacob Yeung

19 posts

Jacob Yeung

@JacobYeung

Ph.D. Student @mldcmu, @cmuneurosci | Previously undergrad @berkeley_ai.

Katılım Mayıs 2013

170 Takip Edilen48 Takipçiler

Sabitlenmiş Tweet

Jacob Yeung@JacobYeung·12 Haz

📅 Where to catch BrainNRDS at @CVPR 🗣️ Oral — June 13, 13:15-13:30 Session 2B: Human Motion, ExHall A2 🖼️ Poster #220 — June 13, 16:00-18:00 ExHall D Stop by to chat about decoding motion from fMRI and video generation! 🧠🎥 #CVPR2025

Jacob Yeung@JacobYeung

1/6 🚀 Excited to share that BrainNRDS has been accepted as an oral at #CVPR2025! We decode motion from fMRI activity and use it to generate realistic reconstructions of videos people watched, outperforming strong existing baselines like MindVideo and Stable Video Diffusion.🧠🎥

English

717

Jacob Yeung retweetledi

Zhiqiu Lin@ZhiqiuLin·6d

Before AI can generate professional videos, it needs to see like a professional. We spent a year with 100+ content creators teaching AI to describe video like a filmmaker would. Introducing CHAI: Critique-based Human-AI Oversight for Building a Precise Video Language [CVPR'26 Highlight, Top 3%]. Try prompting a video generator for a dolly zoom, dutch angle, point of view, or camera roll. Most fall back to the same bland defaults: a push-in, a level shot, a third-person view. Why? These techniques require a language of cinema that current models rarely speak. We built that language: 1️⃣ Precise specification: 5-aspect structured captions co-designed with professional cinematographers covering subject, scene, motion, spatial, and camera dynamics 2️⃣ Scalable oversight: LLMs draft captions, humans critique what's wrong and how to fix it 3️⃣ Post-training recipes: Qwen3-VL-8B surpasses Gemini-3.1 and GPT-5 4️⃣ Video generation: fine-tuned Wan follows 400-word cinematic prompts with precise control Here's how each works 🧵 Work led by CMU and Harvard with @chancharikm, @du_yilun, and @RamananDeva. 📄 Paper: huggingface.co/papers/2604.21… 🌐 Site: linzhiqiu.github.io/papers/chai/

English

364

31.9K

Jacob Yeung retweetledi

Manu Gaur@gaur_manu·10 Nis

Pretrained ViTs like DINOv2 or CLIP are great, but they produce fixed, generic representations that encode the most salient visual concepts (e.g., "cat"). In human vision, prior priming with language changes how people parse an image. We believe visual encoders should do the same 🚨 Introducing Steerable Visual Representations, a new family of visual features you can steer with text towards specific visual concepts.

English

136

900

147.4K

Jacob Yeung retweetledi

Gabriel Sarch@GabrielSarch·7 Nis

Introducing Vero, the strongest fully open RL recipe for training next-generation visual reasoners. From charts to spatial to open-ended tasks, Vero sets a new bar. • sota 8B VLM across 30 benchmarks • +4.4 avg over four base models (30 evals) • beats prior RL datasets 🧵👇

English

300

60.8K

Jacob Yeung retweetledi

Khurram Yamin@KhurramYam·9 Şub

In an exciting collaboration with MSR, we ask a simple question: do LLMs actually behave like rational agents? We test whether an LLM’s stated elicited beliefs can be treated as its true beliefs by checking decision-theoretic coherence. arxiv.org/abs/2602.06286

English

4.5K

Jacob Yeung retweetledi

Zihan Wang@Z1hanW·18 Ara

Introduce CRISP, a real-to-sim pipeline that recovers human motion and simulatable scene geometry from monocular video! CRISP builds contact-faithful 3D scene for simulation - 8× fewer sim failures, +43% faster sim, and improves human motion! Interactive demos👉: crisp-real2sim.github.io/CRISP-Real2Sim/ Exciting collaboration w/ @JiashunWang @jefftan969 @_Tsukasane @ Jessica Hodgins @shubhtuls @RamananDeva

English

348

47.9K

Jacob Yeung retweetledi

Jay Karhade@JayKarhade·12 Ara

Introducing Any4D, a unified transformer for fully feed-forward, dense, metric-scale 4D reconstruction from flexible inputs! Any4D regresses per-pixel motion + geometry across frames in one pass — 15× faster, 2–3× more accurate reconstructions ⚡📈 Details + code below 👇 Exciting collab with @Nik__V__ @YuchenZhan54250 Tanisha Gupta @akashshrm02 @smash0190 @RamananDeva

English

207

47.5K

Jacob Yeung retweetledi

Subha Nawer Pushpita@Pushpita1729·7 Kas

Very excited to share our work on naturalistic audiovisual processing in the human brain 🧠🎬!!

bioRxiv Neuroscience@biorxiv_neursci

Two cortical mechanisms of audiovisual processing in the human brain biorxiv.org/content/10.110… #biorxiv_neursci

English

302

Jacob Yeung retweetledi

Nikhil Keetha@Nik__V__·17 Eyl

Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art results 🚀 One universal model enables SoTA for: 🔥 Mono Depth Estimation 🔥 Multi-View SfM 🔥 Multi-View Stereo 🔥 Depth Completion 🔥 Registration … and many more possibilities! – plus everything is metric 🎯 We release code for data processing, training, benchmarking & ablations – everything Apache 2.0! Details & Links 👇

English

132

741

121.4K

Jacob Yeung retweetledi

Tarasha Khurana@tarashakhurana·18 Tem

Excited to share recent work with @kaihuac5 and @RamananDeva where we learn to do novel view synthesis for dynamic scenes in a self-supervised manner, only from 2D videos! webpage: cog-nvs.github.io arxiv: arxiv.org/abs/2507.12646 code (soon): github.com/Kaihua-Chen/co…

English

112

23.9K

Jacob Yeung retweetledi

Jennifer Hsia@jen_hsia·16 Tem

1/6 Retrieval is supposed to improve generation in RAG systems. But in practice, adding more documents can hurt performance, even when relevant ones are retrieved. We introduce RAGGED, a framework to measure and diagnose when retrieval helps and when it hurts.

English

104

10.2K

Jacob Yeung retweetledi

Elliott / Shangzhe Wu@elliottszwu·13 Haz

This was a really fun and exciting workshop #CVPR2025! Huge thanks to all the speakers, organizers and reviewers @CVPR! We hope to be able to release the video recordings soon!

Elliott / Shangzhe Wu@elliottszwu

Join us for the 4D Vision Workshop @CVPR on June 11 starting at 9:20am! We'll have an incredible lineup of speakers discussing the frontier of 3D computer vision techniques for dynamic world modeling across spatial AI, robotics, astrophysics, and more. 4dvisionworkshop.github.io

English

Jacob Yeung@JacobYeung·13 Haz

6/6 This was an amazing collaboration with co-authors Andrew Luo, @GabrielSarch, @maggiehende, @RamananDeva, and @TarrLab. 🔗Paper: arxiv.org/abs/2406.02659 🔗Project page: brain-nrds.github.io

English

228

Jacob Yeung@JacobYeung·12 Haz

5/6 We also show that dynamic information isn’t just useful for generation. It is key to understanding brain activity as well. We observe that video models predict brain responses to dynamic scenes better than image models, especially in visual and somatosensory cortices.

English

245

Jacob Yeung@JacobYeung·12 Haz

English

6.8K

Jacob Yeung retweetledi

Gabriel Sarch@GabrielSarch·30 May

How can we get VLMs to move their eyes—and reason step-by-step in visually grounded ways? 👀 We introduce ViGoRL, a RL method that anchors reasoning to image regions. 🎯 It outperforms vanilla GRPO and SFT across grounding, spatial tasks, and visual search (86.4% on V*). 👇🧵

English

457

76.5K

Keşfet

@chancharikm @du_yilun @RamananDeva @JiashunWang @jefftan969 @_Tsukasane @shubhtuls @Nik__V__