Jiaming Tang

32 posts

Jiaming Tang

@jmtang42

Ph.D. student @MIT. I am interested in MLSys & Algo.

Cambridge, MA Katılım Ocak 2023

365 Takip Edilen537 Takipçiler

Jiaming Tang retweetledi

Zhijian Liu@zhijianliu_·8 Mar

ParoQuant just got a big upgrade 🚀 ✅ Supports the new Qwen3.5 models ⚡ Now runs on MLX (fast local inference on Apple Silicon) 🧠 Preserves reasoning quality with 4-bit quantization We also built an agent demo running locally on my 4-year-old M2 Max. Can't wait to upgrade to an M5 Max and see what kind of magic we can do. ✨

Zhijian Liu@zhijianliu_

Reasoning LLMs generate very long chains-of-thought, so even small quantization errors add up. With AWQ, Qwen3-4B drops 71.0 → 68.2 on MMLU-Pro (~4% relative loss). 😬 ParoQuant fixes this! It keeps only the critical rotation pairs and fuses everything into a single kernel. Recovers most of the lost reasoning accuracy with minimal overhead — so 4-bit models stay strong at reasoning. 💪💪

English

222

43.2K

Jiaming Tang retweetledi

Xialin He@Xialin_He·4 Mar

Real-world loco-manipulation demands more than replaying fixed reference motions. We argue that true autonomy requires two capabilities: 1️⃣ flexibly leveraging whatever signals are available — dense references, partial cues, state estimates, or egocentric perception 2️⃣ remaining capable when any of these signals are missing or unreliable We introduce ULTRA — an all-in-one controller for unified humanoid loco-manipulation 🤖 It supports: • general reference tracking • sparse goal following • execution with motion capture • execution with egocentric perception 🔗 Project page: ultra-humanoid.github.io

English

111

11.5K

Jiaming Tang retweetledi

Physical Intelligence@physical_int·4 Mar

We’ve developed a memory system for our models that provides both short-term visual memory and long-term semantic memory. Our approach allows us to train robots to perform long and complex tasks, like cleaning up a kitchen or preparing a grilled cheese sandwich from scratch 👇

English

265

2.1K

439.9K

Jiaming Tang retweetledi

Zhijian Liu@zhijianliu_·24 Şub

English

145

1.4K

169K

Jiaming Tang retweetledi

Jyo Pari@jyo_pari·18 Şub

As context windows grow 📈, continual learning matters more! @tianyuanzhang99 will present how to scale test-time training for effectively infinite context ♾ 🗓️ Feb 19, 3pm ET @scaleml

English

177

26K

Jiaming Tang@jmtang42·9 Ara

@_akhaliq Thanks @_akhaliq a lot for sharing our work! ☺️

English

156

AK@_akhaliq·2 Ara

VLASH Real-Time VLAs via Future-State-Aware Asynchronous Inference

English

8.1K

Jiaming Tang@jmtang42·2 Ara

On RTX5090, VLASH can reduce the control latency from ~530 ms to ~30 ms, achieving up to a 17× control latency reduction compared to synchronous inference. On RTX4090 and RTX5070, we can achieve ~15× and ~9× latency reduction, respectively. This low-latency control is essential for highly dynamic tasks and high-frequency correction for the robot.

English

1.4K

Jiaming Tang@jmtang42·2 Ara

We also add a simple trick to make robots move even faster: “quantize” robot actions for speed. VLAs are trained on very fine-grained teleop data, so they output tiny action steps that are often more precise than necessary. VLASH groups every q fine-grained actions into one coarser action, so the robot takes fewer, larger steps that follow almost the same trajectory, but much faster.

English

1.9K

Jiaming Tang@jmtang42·2 Ara

Even large VLAs can play ping-pong in real time! 🏓⚡️ In practice, VLAs struggle with fast, dynamic tasks: • slow reactions, jittery actions. • demos often shown at 5-10× speed to look “smooth”. We introduce VLASH: • future-state-aware asynchronous inference with >30Hz inference frequency for PI0.5 • drop-in to existing VLAs with no extra overhead • enables PI0.5 / PI0 to play ping-pong and other highly dynamic tasks in real time 📄 Paper: arxiv.org/abs/2512.01031 🔧 Code: github.com/mit-han-lab/vl…

English

440

70.4K

Jiaming Tang@jmtang42·1 Oca

@haotiant1998 @GoogleDeepMind @MITEECS Congratulations!! 🎉

English

Haotian Tang@haotiant1998·31 Ara

Personal update: I am excited to share that I will join @GoogleDeepMind next week after defending my PhD thesis @MITEECS earlier last month. I will be working on generative models that simulate the physical world. Looking forward to the new journey ahead in 2025!

English

2.2K

126.3K

Jiaming Tang retweetledi

Guangxuan Xiao@Guangxuan_Xiao·15 Eki

Introducing DuoAttention: Our new framework slashes both memory and latency for long-context LLMs without sacrificing performance! By applying full KV cache only to critical heads, we achieve: ⚡ 2.55x memory reduction ⚡ 2.18x decoding speedup ⚡ 3.3M tokens on a single A100 GPU

English

292

31K

Jiaming Tang@jmtang42·10 Tem

📄Paper: arxiv.org/abs/2406.10774 💻Code: github.com/mit-han-lab/qu… 🌍Website: hanlab.mit.edu/projects/quest

English

489

Jiaming Tang@jmtang42·10 Tem

This research was done during my summer internship at @MIT, with amazing collaborators including @ylzhao_dreamer, Kan Zhu, @Guangxuan_Xiao, @bariskasikci and my advisor @songhan_mit!

English

690

Jiaming Tang@jmtang42·10 Tem

🚀Excited to introduce Quest: an efficient long-context LLM inference framework, accepted by ICML 2024!🌟 ⚡️Quest leverages query-aware sparsity to achieve up to 2.23× e2e speedup for long-context LLM inference. 📄Paper: arxiv.org/abs/2406.10774 💻Code: github.com/mit-han-lab

GIF

English

16.3K

Keşfet

@tianyuanzhang99 @scaleml @_akhaliq @haotiant1998 @GoogleDeepMind @MITEECS @MIT @ylzhao_dreamer