Xinyang (Young) Geng

68 posts

Xinyang (Young) Geng

Xinyang (Young) Geng

@younggeng

Research scientist at Google DeepMind. Opinions are my own.

Katılım Şubat 2014
526 Takip Edilen1.1K Takipçiler
Sabitlenmiş Tweet
Xinyang (Young) Geng
Xinyang (Young) Geng@younggeng·
Are you interested in training large models in JAX but are set back by the complicated partition specs and sharding configurations required to scale up? I've recently created scalax, a small library to help developers easily scale up JAX models. github.com/young-geng/sca…
Xinyang (Young) Geng tweet media
English
4
39
234
29.4K
Hieu Pham
Hieu Pham@hyhieu226·
I have made the difficult decision to leave @OpenAI. Working here and at @xai before was a once-in-a-lifetime experience. I have met the best people. Not the best people in AI. Not the best people in tech. Simply the best people. At these companies, I have helped creating extremely intelligent entities that will meaningfully improve our lives. The work makes me proud. But the intensive work came with a price. I cannot believe I would say this one day, but I am burnt out. All the mental health deteriorating that I used to scoff at is real, miserable, scary, and dangerous. I am going to take a break from frontier AI labs, and will take my family to my home country Vietnam. There, I will try something new, and also search for a cure for my conditions. I hope I will heal. Until then.
English
1.1K
418
14K
1.2M
Xinyang (Young) Geng retweetledi
Jacob Austin
Jacob Austin@jacobaustin132·
Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n
Jacob Austin tweet media
English
37
517
3.5K
402.4K
Xinyang (Young) Geng retweetledi
Jack Rae
Jack Rae@jack_w_rae·
2.5 Flash is out! You can now specify thinking budgets, or disable thinking entirely for lower latency. Strong code & reasoning capabilities, cost effective, fast. It's a great workhorse model for enterprise and developers, excited to hear your feedback.
Arena.ai@arena

Gemini 2.5 Flash is described as being optimized for speed and scalability. Despite its lighter design, the community voted for it's impressive performance on Hard Prompts, Coding, and Long Queries. Matching the strength of its older sibling, Gemini 2.5 Pro at #1 in these categories

English
4
11
202
18K
Xinyang (Young) Geng retweetledi
rdyro
rdyro@rdyro128523·
Llama 4 inference in pure JAX! Expert/tensor parallelism with int8 quantization. Contributions welcome!
rdyro tweet media
English
2
14
134
11.5K
Xinyang (Young) Geng retweetledi
Jack Rae
Jack Rae@jack_w_rae·
Today we are launching 2.5 Pro! I think it's the best model in the world. State-of-the-art reasoning and great vibes (+39 ELO gap on lmsys!) 2.5 Pro improves in coding, stem, multimodal, instruction following, and lots more. Available in AI Studio & the Gemini App!
Jack Rae tweet media
English
7
39
461
43.3K
William Fedus
William Fedus@LiamFedus·
This is what I sent to my colleagues at OpenAI: Hi all, I made the difficult decision to leave OpenAI as an employee, but I’m looking to work closely together as a partner going forward. Contributing to the mission of OpenAI and working with world-class teams to create and improve ChatGPT has been an experience of a lifetime. But I’ve gotten really excited about AI for science. My undergrad was in physics and I’m keen to apply this technology there. Because AI for science is one of the most strategically important areas to OpenAI and achieving ASI, OpenAI is planning to invest in and partner with my new company. So I’ll see you all around! Thanks to all the leadership who believed in me early on, especially, Sam, Greg, and Mark. Thank you everyone on post-training and to all of our collaborators across research and product. I’ll miss working with so many of you, but will be cheering you on! Post-training has an amazing roster of talent and leaders who will continue to drive its success.
English
124
54
1.8K
508.5K
Xinyang (Young) Geng retweetledi
rdyro
rdyro@rdyro128523·
Deepseek R1 inference in pure JAX! Currently on TPU, with GPU and distilled models in-progress. Features MLA-style attention, expert/tensor parallelism & int8 quantization. Contributions welcome!
rdyro tweet media
English
8
46
295
47.9K
Xinyang (Young) Geng retweetledi
Jacob Austin
Jacob Austin@jacobaustin132·
Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
Jacob Austin tweet media
English
26
388
1.9K
463.1K
Xinyang (Young) Geng retweetledi
Hieu Pham
Hieu Pham@hyhieu226·
Despite many complaints about Jax being hard to use, it has a crucial advantage over PyTorch: for distributed jobs, XLA is sufficiently good at auto-scheduling parallelism strategies, e.g., sharding, overlapping compute and comms. If PyTorch becomes good at that, it's checkmate.
English
9
11
171
14.2K
Xinyang (Young) Geng retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
For friends of open source: imo the highest leverage thing you can do is help construct a high diversity of RL environments that help elicit LLM cognitive strategies. To build a gym of sorts. This is a highly parallelizable task, which favors a large community of collaborators.
English
316
823
8.4K
1.2M
Xinyang (Young) Geng retweetledi
Jim Fan
Jim Fan@DrJimFan·
Whether you like it or not, the future of AI will not be canned genies controlled by a "safety panel". The future of AI is democratization. Every internet rando will run not just o1, but o8, o9 on their toaster laptop. It's the tide of history that we should surf on, not swim against. Might as well start preparing now. DeepSeek just topped Chatbot Arena, my go-to vibe checker in the wild, and two other independent benchmarks that couldn't be hacked in advance (Artificial-Analysis, HLE). Last year, there were serious discussions about limiting OSS models by some compute threshold. Turns out it was nothing but our Silicon Valley hubris. It's a humbling wake-up call to us all that open science has no boundary. We need to embrace it, one way or another. Many tech folks are panicking about how much DeepSeek is able to show with so little compute budget. I see it differently - with a huge smile on my face. Why are we not happy to see *improvements* in the scaling law? DeepSeek is unequivocal proof that one can produce unit intelligence gain at 10x less cost, which means we shall get 10x more powerful AI with the compute we have today and are building tomorrow. Simple math! The AI timeline just got compressed. Here's my 2025 New Year resolution for the community: No more AGI/ASI urban myth spreading. No more fearmongering. Put our heads down and grind on code. Open source, as much as you can. Acceleration is the only way forward.
Jim Fan tweet mediaJim Fan tweet mediaJim Fan tweet media
English
215
635
3.1K
464.7K
Xinyang (Young) Geng retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
It’s done because it’s much easier to 1) collect, 2) evaluate, and 3) beat and make progress on. We’re going to see every task that is served neatly packaged on a platter like this improved (including those that need PhD-grade expertise). But jobs (even intern-level) that need long, multimodal, coherent, error-correcting sequences of tasks glued together for problem solving will take longer. They are unintuitively hard, in a Moravec’s Paradox sense. Fwiw I’m ok and happy to see harder “task” evals. Calling it humanity’s last exam is a bit much, and misleading.
Niels Rogge@NielsRogge

Unpopular opinion: benchmarks like these are moving the field in the wrong direction No I don't want an AI to be able to memorize (useless?) questions like "How many paired tendons are supported by a sesamoid bone?" in its weights I want the "intern", as @karpathy is suggesting

English
81
235
2.5K
423.9K
Xinyang (Young) Geng retweetledi
Jerry Tworek
Jerry Tworek@MillionInt·
Simplify. Scale. Resolve bottlenecks. Repeat.
English
4
8
132
9.1K
Xinyang (Young) Geng retweetledi
Jack Rae
Jack Rae@jack_w_rae·
Appreciate @aidan_mclau looking into the thinking model results. Originally scores looked weak as the response was plucked from the thought content versus output. We are looking into ways of making thinking output less confusing for people running evals. This is why we 🚢, to collect feedback and iterate!
Aidan McLaughlin@aidan_mclau

two aidanbench updates: > gemini-2.0-flash-thinking is now #2 (explanation for score change below) > deepseek v3 is #22 (thoughts below)

English
5
8
102
19.5K
Xinyang (Young) Geng retweetledi
Jerry Tworek
Jerry Tworek@MillionInt·
People completely misunderstand the data wall. It's the data slop wall. Most of the data is so bad it's a waste of a good gpu to backprop it.
English
13
9
231
24.3K
Xinyang (Young) Geng retweetledi
Charlie Snell
Charlie Snell@sea_snell·
Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task? We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵
Charlie Snell tweet media
English
14
68
570
156.3K
Xinyang (Young) Geng retweetledi
Cristian Garcia
Cristian Garcia@cgarciae88·
People learning JAX, feel free to reach out if the learning feels too steep, hopefully we can flatten it out. Also, checkout the JAX LLM for help from the community: discord.gg/m9NDrmENe2
xjdr@_xjdr

This has been and will continue to be my recommendation for anyone in this position. Learn jax and sign up for sites.research.google/trc/about/ Its one of the best things Google has ever done. You can do meaningful research for free, but the learning curve is steep. strap in

English
6
19
281
32.8K
Xinyang (Young) Geng retweetledi
Ayaka Mikazuki (Keep4o)
Ayaka Mikazuki (Keep4o)@ayaka14732·
We finally have an official `nvidia-smi` for TPU 🎉 Simply install it with `pip install tpu-info`
Ayaka Mikazuki (Keep4o) tweet media
English
14
98
865
81.5K