Junyi Zhang (@junyi42) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

𝗢𝗻𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗰𝗮𝗻’𝘁 𝗿𝘂𝗹𝗲 𝘁𝗵𝗲𝗺 𝗮𝗹𝗹. We present 𝗟𝗼𝗚𝗲𝗥, a new 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 architecture for long-context geometric reconstruction. LoGeR enables stable reconstruction over up to 𝟭𝟬𝗸 𝗳𝗿𝗮𝗺𝗲𝘀 / 𝗸𝗶𝗹𝗼𝗺𝗲𝘁𝗲𝗿 𝘀𝗰𝗮𝗹𝗲, with 𝗹𝗶𝗻𝗲𝗮𝗿-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 in sequence length, 𝗳𝘂𝗹𝗹𝘆 𝗳𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 inference, and 𝗻𝗼 𝗽𝗼𝘀𝘁-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. Yet it matches or surpasses strong optimization-based pipelines. (1/5) @GoogleDeepMind @Berkeley_AI

English

63

449

3.4K

547K

Junyi Zhang retweetledi

Haocheng Xi@HaochengXiUCB·3d

𝗞-𝗺𝗲𝗮𝗻𝘀 𝗶𝘀 𝘀𝗶𝗺𝗽𝗹𝗲. 𝗠𝗮𝗸𝗶𝗻𝗴 𝗶𝘁 𝗳𝗮𝘀𝘁 𝗼𝗻 𝗚𝗣𝗨𝘀 𝗶𝘀𝗻’𝘁. That’s why we built Flash-KMeans — an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks. By attacking the memory bottlenecks directly, Flash-KMeans achieves 30x speedup over cuML and 200x speedup over FAISS — with the same exact algorithm, just engineered for today’s hardware. At the million-scale, Flash-KMeans can complete a k-means iteration in milliseconds. A classic algorithm — redesigned for modern GPUs. Paper: arxiv.org/abs/2603.09229 Code: github.com/svg-project/fl…

English

35

196

1.7K

277.6K

Junyi Zhang retweetledi

Charles Herrmann@CharlesHerrman8·9 Mar

We're very excited to present a new hybrid memory version of feed-forward geometric reconstruction! The core intuition is that our architectures should be designed with type of training data we have available in mind. The result is very long (kilometer-scale) reconstruction!!

Junyi Zhang@junyi42

𝗢𝗻𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗰𝗮𝗻’𝘁 𝗿𝘂𝗹𝗲 𝘁𝗵𝗲𝗺 𝗮𝗹𝗹. We present 𝗟𝗼𝗚𝗲𝗥, a new 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 architecture for long-context geometric reconstruction. LoGeR enables stable reconstruction over up to 𝟭𝟬𝗸 𝗳𝗿𝗮𝗺𝗲𝘀 / 𝗸𝗶𝗹𝗼𝗺𝗲𝘁𝗲𝗿 𝘀𝗰𝗮𝗹𝗲, with 𝗹𝗶𝗻𝗲𝗮𝗿-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 in sequence length, 𝗳𝘂𝗹𝗹𝘆 𝗳𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 inference, and 𝗻𝗼 𝗽𝗼𝘀𝘁-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. Yet it matches or surpasses strong optimization-based pipelines. (1/5) @GoogleDeepMind @Berkeley_AI

English

1

5

108

16.1K

Junyi Zhang@junyi42·9 Mar

Check out the project page for more details! 🌐 Webpage: loger-project.github.io 📄 Paper: arxiv.org/abs/2603.03269 Yet another wonderful collaboration with this amazing team: @CharlesHerrman8* @JunhwaHur* @jesu9 @MingHsuanYang @forrestercole2 @trevordarrell @DeqingSun+

English

4

8

131

10.5K

Junyi Zhang@junyi42·9 Mar

LoGeR breaks both walls with 𝗰𝗵𝘂𝗻𝗸-𝘄𝗶𝘀𝗲 𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 + 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆: 🔹 Local Memory (SWA): non-parametric, lossless sliding-window attention preserves high-fidelity adjacent alignment. 🔹 Global Memory (TTT): compressed fast weights propagate long-range structure and stabilize scale over kilometer-scale trajectories.

English

1

0

54

10.8K

Junyi Zhang@junyi42·9 Mar

𝗢𝗻𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗰𝗮𝗻’𝘁 𝗿𝘂𝗹𝗲 𝘁𝗵𝗲𝗺 𝗮𝗹𝗹. We present 𝗟𝗼𝗚𝗲𝗥, a new 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 architecture for long-context geometric reconstruction. LoGeR enables stable reconstruction over up to 𝟭𝟬𝗸 𝗳𝗿𝗮𝗺𝗲𝘀 / 𝗸𝗶𝗹𝗼𝗺𝗲𝘁𝗲𝗿 𝘀𝗰𝗮𝗹𝗲, with 𝗹𝗶𝗻𝗲𝗮𝗿-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 in sequence length, 𝗳𝘂𝗹𝗹𝘆 𝗳𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 inference, and 𝗻𝗼 𝗽𝗼𝘀𝘁-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. Yet it matches or surpasses strong optimization-based pipelines. (1/5) @GoogleDeepMind @Berkeley_AI

English

63

449

3.4K

547K

Junyi Zhang@junyi42·22 Şub

Solid work by @denghilbert and very impressive **FFW 3DGS** result! Key idea: Feature alignment is a core building block in classical 3D pipelines - it turns out to be what FFW model needs, too. By explicitly enforcing feature alignment, we achieve sota for FFW 3DGS prediction.

youming.deng@denghilbert

We present the SOTA feed-forward 3DGS pipeline Selfi, which was accepted by #CVPR2026 Project Page: denghilbert.github.io/selfi

English

1

8

100

12K

Junyi Zhang@junyi42·26 Oca

Very excited to share our year-long work with an amazing team @zwcolin @aomaru_21490 and all! Everything is open sourced, the code, benchmark, trajectories, datasets, and models: VisGym.github.io We hope this could be a step towards developing general-purpose vlm agent

English

1

0

5

551

Junyi Zhang@junyi42·26 Oca

Luckily, the same environment we designed for evaluation could be used for training, and is a diverse, customizable, scalable source. We have early exploration in the paper on how to generate SFT data that is more effective, we believe more potentials are ahead (eg, with RL) 🧵

English

1

0

3

505

Junyi Zhang@junyi42·26 Oca

We rigorously test VLM agents across diverse domains: symbolic, 2D, 3D, embodied. Even frontier models face key gaps for general vlm agent: 1. Memory: more history context hurts performance. 2. Perception: still a major bottleneck. 3. Partial observation is very hard. 🧵

Zirui "Colin" Wang@zwcolin

🎮 We release VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents (w/ @junyi42 @aomaru_21490) 🌐 With 17 environments across multiple domains, we show systematically the brittleness of VLMs in visual interaction, and what training leads to. 🧵[1/8]

English

1

4

37

3.3K

Junyi Zhang@junyi42·23 Oca

Reconstruction not in the form of point clouds, but agentically via blender code. Very interesting work!

Haiwen (Haven) Feng@HavenFeng

✨Thinking with Blender~ Meet VIGA: a multimodal agent that autonomously codes 3D/4D blender scenes from any image, with no human, no training! @berkeley_ai #LLMs #Blender #Agent 🧵1/6

English

0

1

23

2.8K

Junyi Zhang retweetledi

Yao Tang@tyao923·17 Oca

𝗧𝗵𝗶𝗻𝗸 𝘄𝗶𝗱𝗲𝗿. 𝗧𝗵𝗶𝗻𝗸 𝘀𝗵𝗼𝗿𝘁𝗲𝗿. 🚀 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗠𝘂𝗹𝘁𝗶𝗽𝗹𝗲𝘅 𝗧𝗵𝗶𝗻𝗸𝗶𝗻𝗴: token-wise branch-and-merge reasoning for LLMs. 💸 Discrete CoT is costly. 🎛️ Existing continuous tokens often clash with 𝗼𝗻-𝗽𝗼𝗹𝗶𝗰𝘆 𝗥𝗟 𝗲𝘅𝗽𝗹𝗼𝗿𝗮𝘁𝗶𝗼𝗻. 🎥 𝗠𝘂𝗹𝘁𝗶𝗽𝗹𝗲𝘅 𝗧𝗵𝗶𝗻𝗸𝗶𝗻𝗴, a sampling-based continuous reasoning paradigm:

English

25

111

811

150.2K

Junyi Zhang@junyi42·16 Oca

interesting findings!

Lisa Dunlap@lisabdunlap

🌟NEW PAPER🌟 Do you know that changing a visual marker from red to blue can completely reorder VLM leaderboards? In our most recent work, we explore the fragility of visually prompted benchmarks. lisadunlap.github.io/vpbench/

English

0

6

580

Junyi Zhang retweetledi

1X@1x_tech·12 Oca

NEO’s Starting to Learn on Its Own

English

298

416

3.1K

6.2M

Junyi Zhang retweetledi

Lisa Dunlap@lisabdunlap·15 Ara

🧵Tired of scrolling through your horribly long model traces in VSCode to figure out why your model failed? We made StringSight to fix this: an automated pipeline for analyzing your model outputs at scale. ➡️Demo: stringsight.com ➡️Blog: blog.stringsight.com

English

3

37

91

27K

Junyi Zhang retweetledi

Qianqian Wang@QianqianWang5·3 Ara

I'm recruiting multiple PhD students this cycle to join me at Harvard University and the Kempner Institute! My interests span vision and intelligence, including 3D/4D, active perception, memory, representation learning, and anything you're excited to explore! Deadline: Dec 15th.

English

25

153

930

174.1K

Junyi Zhang retweetledi

Long Lian@LongTonyLian·1 Ara

LLMs are getting crazily good at reasoning — but also crazily slow. Hard problems can make them think for hours. Why? Even with tons of GPUs, they still decode one. token. at. a. time.⏳ More GPUs ≠ faster answers Our ThreadWeaver🧵⚡asks: “Why not make LLMs think in parallel?” 🧵1/N👇

English

5

34

137

62.3K

Junyi Zhang retweetledi

tyler bonnen@tylerraye·26 Kas

starting fall 2026 i'll be an assistant professor at @Penn 🥳 my lab will develop scalable models/theories of human behavior, focused on memory and perception currently recruiting PhD students in psychology, neuroscience, & computer science! reach out if you're interested😊

English

35

59

439

52.3K

Junyi Zhang

Keşfet