Ryan Ng

24 posts

Ryan Ng

Ryan Ng

@aftermultiply

Reasoning @xAI | ex-@OpenAI | TL at Ray / Anyscale | K8s | DynamoDB

Katılım Şubat 2026
21 Takip Edilen33 Takipçiler
Ryan Ng
Ryan Ng@aftermultiply·
@yunta_tsai Not too many places offer such opportunity : felt lucky to have been in a few of them
English
0
0
0
76
Yun-Ta Tsai
Yun-Ta Tsai@yunta_tsai·
Pursuit of happiness, builder edition: - Find a problem you love spending your life solving. - Spend your life solving it.
English
28
29
362
16.3K
Niloofar
Niloofar@niloofar_mire·
I’ve been feeling a bit burnt out so I decided very last minute a few days ago to fly to scandinavia and detach a bit. As I was chilling in a random park in Copenhagen and taking this photo I overheard the couple next to me talking about world models and grounded video gen LOL
Niloofar tweet media
English
20
1
227
12.6K
Boyuan Zheng
Boyuan Zheng@boyuan__zheng·
Excited to see people try Grok Build for web dev. Our team has put a lot of effort into improving its aesthetics, functionality, and more exciting features to be expected with recursive self-improvement loop. It’s still early beta, and feedback is very welcome. Please try it out and let us know where we can improve.
Kilo@kilocode

Grok Build 0.1 might be one of the most underestimated AI models right now. We tested it in Kilo Code by asking it to build 5 websites from scratch. Here are the results:

English
244
113
1.5K
31.4M
Joanne Jang
Joanne Jang@joannejang·
learned this quote from 2023 is making rounds -- i actually don't think this is true anymore in 2026! The model should be invisible. i expect us to flip back to ux in the form of agent behavior + continual learning loops; and the alpha is in making models feel natural and as invisible as possible.
Joanne Jang tweet media
English
26
21
410
123.2K
Ryan Ng
Ryan Ng@aftermultiply·
@CoreAutoAI “The discipline to focus on what matters before the training run, especially things like data quality and systems readiness”
English
0
0
0
78
Core Automation
Core Automation@CoreAutoAI·
What is pretraining? Asking for a friend
English
29
3
119
14.1K
Ryan Ng
Ryan Ng@aftermultiply·
Never understood the magic of @Tailscale until I started using it for myself
English
0
0
0
43
Ryan Ng
Ryan Ng@aftermultiply·
@MillionInt Agents make that much more tractable
English
0
0
0
114
Jerry Tworek
Jerry Tworek@MillionInt·
Best productivity hack I know is organizing your work so that you enjoy it the most
English
24
24
514
25.5K
Ryan Ng
Ryan Ng@aftermultiply·
Agents coming online
Ryan Ng tweet media
English
0
0
2
52
Ryan Ng retweetledi
Zhuohan Li
Zhuohan Li@zhuohan123·
Try out deepseek v4 on vLLM!
vLLM@vllm_project

🎉 Day-0 support for @deepseek_ai V4 Pro and Flash on vLLM — a new generation of DeepSeek model, purpose-built for tasks up to 1M tokens. Alongside the release, we're publishing a first-principles walkthrough of the new long-context attention and how we implemented it in vLLM. The new attention mechanism, in four moves: • Shared K/V + inverse RoPE → 2× memory savings • c4a / c128a KV compression → 4×–128× savings • DeepSeek Sparse Attention over compressed tokens • Short sliding window for locality across compression boundaries At 1M context, per-layer KV state is ~8.7× smaller than a DeepSeek V3.2-style 61-layer stack (9.62 GiB vs 83.9 GiB, bf16). fp8 attention cache + fp4 indexer cache shrink it further. vLLM side: • Unified hybrid KV cache — single logical block size (256 native positions) across all compression rates; compressor state folded into the SWA KV cache spec so prefix caching, disagg prefill, CUDA graphs and MTP reuse the same abstraction • Three page-size buckets for the full 5-way cache stack → no cross-kind fragmentation • Fused kernels: compressor + RMSNorm + RoPE + cache insert (1.4–3×), inverse RoPE + fp8 quant (2–3×), Q-norm + KV RoPE + K insert (10–20×) • Multi-stream overlap of indexer vs main-KV compression vs SWA insertion Disaggregated serving is supported out of the box and strongly recommended for best performance. Follow our recipes site for verified commands for @nvidia Blackwell (B200, B300, GB200, GB300) and Hopper (H100/H200/H20) systems. Thanks to the @deepseek_ai team for open-sourcing DeepSeek V4, and to @inferact for landing day-0 support 🤝 📝 Blog: vllm.ai/blog/deepseek-… 📖 Recipes: recipes.vllm.ai/deepseek-ai/De… 🤗 huggingface.co/deepseek-ai/De…

English
1
3
33
2.6K
Ryan Ng retweetledi
Ryan Ng retweetledi
mimic
mimic@mimicrobotics·
With mimic-video, we were among the very first to propose Video-Action Models for robotics. Today, we are open-sourcing the recipe.
mimic tweet media
English
3
32
275
41.2K
Arshdeep Singh
Arshdeep Singh@arshdeep·
After an unforgettable ride, this was my last week at @xai. When I joined in 2024, xAI was a small yet an extraordinary team. I got an amazing opportunity to build our core ML Platforms from scratch and form an ambitious, amazing team. Together we built the core systems that power frontier-model research, evaluations, human data collection, agent training, and daily productivity for every member of technical staff at xAI. It was a true privilege. I’m deeply grateful to @elonmusk and every single person I’ve had the chance to work with across research, engineering, compute, infra, and beyond. The pace, collaboration, and shared mission to understand the universe and be truth seeking have been unmatched. Being able to contribute to data, training, and infra behind Grok-2-1212, Grok 3, Grok 4, 4.2, Aurora, Imagine and many other efforts been the highlight of my career so far. Special thank you to my incredible team and all the brilliant people who made this ride so special. I leave more inspired than ever. Excited for what’s next. Ad astra 🚀
Arshdeep Singh tweet mediaArshdeep Singh tweet media
English
89
16
857
83.2K
Regina Lin
Regina Lin@reggitales·
Introducing Dex: the self-driving workspace for operators. Dex is the first agent system with full operational context and a self-updating knowledge base. Every datapoint from your workspace is ingested, synced, and structured into compounding context for agents to take action. Comment "DEX" or tag @dexbythirdlayer access. First 1,000 sign-ups get 7 days free. After, join our rolling waitlist. Sign up at joindex [dot] com for a fun surprise. How dex works (threads)
English
93
27
279
64.8K
Ryan Ng
Ryan Ng@aftermultiply·
@yzeng58 Does the flushed kv introduce gaps in pos embeddings ?
English
1
0
1
102
Yuchen Zeng
Yuchen Zeng@yzeng58·
Reasoning models think hard — but all that thinking fills up your KV cache fast. Memento fixes this: the model compresses its own chain-of-thought mid-generation, flushing old KV entries after each block. 2-3× less peak KV cache, ~2× throughput — accuracy largely preserved. The cool part: deciding what to remember and what to forget is a capability the model acquires through training — not something you bolt on. Excited about where this goes — especially for agents.
Dimitris Papailiopoulos@DimitrisPapail

x.com/i/article/2041…

English
4
15
105
13.9K
Ryan Ng
Ryan Ng@aftermultiply·
@peteflorence Congrats, looking forward to what’s next!
English
1
0
1
1.2K