Junyi Zhang

197 posts

Junyi Zhang banner
Junyi Zhang

Junyi Zhang

@junyi42

CS Ph.D. Student @Berkeley_AI. B.Eng. @SJTU1896 CS. Working with @GoogleDeepMind, previous @MSFTResearch. Vision, generative model, representation learning.

Katılım Temmuz 2022
532 Takip Edilen2.7K Takipçiler
Sabitlenmiş Tweet
Junyi Zhang
Junyi Zhang@junyi42·
𝗢𝗻𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗰𝗮𝗻’𝘁 𝗿𝘂𝗹𝗲 𝘁𝗵𝗲𝗺 𝗮𝗹𝗹. We present 𝗟𝗼𝗚𝗲𝗥, a new 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 architecture for long-context geometric reconstruction. LoGeR enables stable reconstruction over up to 𝟭𝟬𝗸 𝗳𝗿𝗮𝗺𝗲𝘀 / 𝗸𝗶𝗹𝗼𝗺𝗲𝘁𝗲𝗿 𝘀𝗰𝗮𝗹𝗲, with 𝗹𝗶𝗻𝗲𝗮𝗿-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 in sequence length, 𝗳𝘂𝗹𝗹𝘆 𝗳𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 inference, and 𝗻𝗼 𝗽𝗼𝘀𝘁-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. Yet it matches or surpasses strong optimization-based pipelines. (1/5) @GoogleDeepMind @Berkeley_AI
English
63
449
3.4K
547K
Junyi Zhang retweetledi
Haocheng Xi
Haocheng Xi@HaochengXiUCB·
𝗞-𝗺𝗲𝗮𝗻𝘀 𝗶𝘀 𝘀𝗶𝗺𝗽𝗹𝗲. 𝗠𝗮𝗸𝗶𝗻𝗴 𝗶𝘁 𝗳𝗮𝘀𝘁 𝗼𝗻 𝗚𝗣𝗨𝘀 𝗶𝘀𝗻’𝘁. That’s why we built Flash-KMeans — an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks. By attacking the memory bottlenecks directly, Flash-KMeans achieves 30x speedup over cuML and 200x speedup over FAISS — with the same exact algorithm, just engineered for today’s hardware. At the million-scale, Flash-KMeans can complete a k-means iteration in milliseconds. A classic algorithm — redesigned for modern GPUs. Paper: arxiv.org/abs/2603.09229 Code: github.com/svg-project/fl…
English
35
196
1.7K
277.6K
Junyi Zhang retweetledi
Junyi Zhang
Junyi Zhang@junyi42·
LoGeR breaks both walls with 𝗰𝗵𝘂𝗻𝗸-𝘄𝗶𝘀𝗲 𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 + 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆: 🔹 Local Memory (SWA): non-parametric, lossless sliding-window attention preserves high-fidelity adjacent alignment. 🔹 Global Memory (TTT): compressed fast weights propagate long-range structure and stabilize scale over kilometer-scale trajectories.
Junyi Zhang tweet media
English
1
0
54
10.8K
Junyi Zhang
Junyi Zhang@junyi42·
𝗢𝗻𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗰𝗮𝗻’𝘁 𝗿𝘂𝗹𝗲 𝘁𝗵𝗲𝗺 𝗮𝗹𝗹. We present 𝗟𝗼𝗚𝗲𝗥, a new 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 architecture for long-context geometric reconstruction. LoGeR enables stable reconstruction over up to 𝟭𝟬𝗸 𝗳𝗿𝗮𝗺𝗲𝘀 / 𝗸𝗶𝗹𝗼𝗺𝗲𝘁𝗲𝗿 𝘀𝗰𝗮𝗹𝗲, with 𝗹𝗶𝗻𝗲𝗮𝗿-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 in sequence length, 𝗳𝘂𝗹𝗹𝘆 𝗳𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 inference, and 𝗻𝗼 𝗽𝗼𝘀𝘁-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. Yet it matches or surpasses strong optimization-based pipelines. (1/5) @GoogleDeepMind @Berkeley_AI
English
63
449
3.4K
547K
Junyi Zhang
Junyi Zhang@junyi42·
Very excited to share our year-long work with an amazing team @zwcolin @aomaru_21490 and all! Everything is open sourced, the code, benchmark, trajectories, datasets, and models: VisGym.github.io We hope this could be a step towards developing general-purpose vlm agent
English
1
0
5
551
Junyi Zhang
Junyi Zhang@junyi42·
Luckily, the same environment we designed for evaluation could be used for training, and is a diverse, customizable, scalable source. We have early exploration in the paper on how to generate SFT data that is more effective, we believe more potentials are ahead (eg, with RL) 🧵
English
1
0
3
505
Junyi Zhang
Junyi Zhang@junyi42·
We rigorously test VLM agents across diverse domains: symbolic, 2D, 3D, embodied. Even frontier models face key gaps for general vlm agent: 1. Memory: more history context hurts performance. 2. Perception: still a major bottleneck. 3. Partial observation is very hard. 🧵
Zirui "Colin" Wang@zwcolin

🎮 We release VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents (w/ @junyi42 @aomaru_21490) 🌐 With 17 environments across multiple domains, we show systematically the brittleness of VLMs in visual interaction, and what training leads to. 🧵[1/8]

English
1
4
37
3.3K
Junyi Zhang retweetledi
Yao Tang
Yao Tang@tyao923·
𝗧𝗵𝗶𝗻𝗸 𝘄𝗶𝗱𝗲𝗿. 𝗧𝗵𝗶𝗻𝗸 𝘀𝗵𝗼𝗿𝘁𝗲𝗿. 🚀 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗠𝘂𝗹𝘁𝗶𝗽𝗹𝗲𝘅 𝗧𝗵𝗶𝗻𝗸𝗶𝗻𝗴: token-wise branch-and-merge reasoning for LLMs. 💸 Discrete CoT is costly. 🎛️ Existing continuous tokens often clash with 𝗼𝗻-𝗽𝗼𝗹𝗶𝗰𝘆 𝗥𝗟 𝗲𝘅𝗽𝗹𝗼𝗿𝗮𝘁𝗶𝗼𝗻. 🎥 𝗠𝘂𝗹𝘁𝗶𝗽𝗹𝗲𝘅 𝗧𝗵𝗶𝗻𝗸𝗶𝗻𝗴, a sampling-based continuous reasoning paradigm:
English
25
111
811
150.2K
Junyi Zhang retweetledi
1X
1X@1x_tech·
NEO’s Starting to Learn on Its Own
English
298
416
3.1K
6.2M
Junyi Zhang retweetledi
Lisa Dunlap
Lisa Dunlap@lisabdunlap·
🧵Tired of scrolling through your horribly long model traces in VSCode to figure out why your model failed? We made StringSight to fix this: an automated pipeline for analyzing your model outputs at scale. ➡️Demo: stringsight.com ➡️Blog: blog.stringsight.com
English
3
37
91
27K
Junyi Zhang retweetledi
Qianqian Wang
Qianqian Wang@QianqianWang5·
I'm recruiting multiple PhD students this cycle to join me at Harvard University and the Kempner Institute! My interests span vision and intelligence, including 3D/4D, active perception, memory, representation learning, and anything you're excited to explore! Deadline: Dec 15th.
English
25
153
930
174.1K
Junyi Zhang retweetledi
Long Lian
Long Lian@LongTonyLian·
LLMs are getting crazily good at reasoning — but also crazily slow. Hard problems can make them think for hours. Why? Even with tons of GPUs, they still decode one. token. at. a. time.⏳ More GPUs ≠ faster answers Our ThreadWeaver🧵⚡asks: “Why not make LLMs think in parallel?” 🧵1/N👇
Long Lian tweet media
English
5
34
137
62.3K
Junyi Zhang retweetledi
tyler bonnen
tyler bonnen@tylerraye·
starting fall 2026 i'll be an assistant professor at @Penn 🥳 my lab will develop scalable models/theories of human behavior, focused on memory and perception currently recruiting PhD students in psychology, neuroscience, & computer science! reach out if you're interested😊
tyler bonnen tweet mediatyler bonnen tweet media
English
35
59
439
52.3K