Matthew Johnson

2K posts

Matthew Johnson

Matthew Johnson

@SingularMattrix

Researcher at Google Brain. I work on JAX (https://t.co/UGa5tGfinF).

Katılım Temmuz 2010
3.3K Takip Edilen13.6K Takipçiler
Matthew Johnson retweetledi
Greg Brockman
Greg Brockman@gdb·
GPT-5.2 derived a novel result in theoretical physics, showing that a type of particle interaction many physicists expected would not occur can in fact arise under specific conditions. There is great promise in the potential of AI to benefit people by accelerating science.
Greg Brockman tweet media
OpenAI@OpenAI

GPT-5.2 derived a new result in theoretical physics. We’re releasing the result in a preprint with researchers from @the_IAS, @VanderbiltU, @Cambridge_Uni, and @Harvard. It shows that a gluon interaction many physicists expected would not occur can arise under specific conditions. openai.com/index/new-resu…

English
205
231
2.4K
406.8K
Dan Roy
Dan Roy@roydanroy·
In mid-January, I’ll join Google DeepMind’s Science unit as a Visiting Research Scientist, on leave from the University of Toronto. I'm excited to be joining Google DeepMind's efforts to accelerate mathematical research with AI.
English
30
6
302
13.6K
Dan Roy
Dan Roy@roydanroy·
Big announcement time... Today is my last day as Research Director at the Vector Institute. It has been my incredible privilege over the past 2.5 years to serve the Vector community and help build an institution that supports world-class ML research and real-world impact.
English
36
10
606
54.9K
Matthew Johnson retweetledi
Ashish Vaswani
Ashish Vaswani@ashVaswani·
We frictionlessly trained on AMD GPUs and TPUs with a unified JAX framework. Our goodput for flagship runs went past 90%. @YashVanjani @mjcOhio @alokpathy @pcmonk painstakingly removed obstacles to maximize experimental velocity.
English
2
12
171
33.4K
Matthew Johnson
Matthew Johnson@SingularMattrix·
@Tgale96 @ezyang You can also skip work and transfers using lax control flow, like conds and/or while_loops, but those won't be as efficient as a Pallas kernel (even on TPU).
English
0
0
1
391
Edward Z. Yang
Edward Z. Yang@ezyang·
TPU question: suppose I want to do a point wise operation on a buffer that is 90% padding, but the padding boundary is only known on device. How do I avoid wasting compute cycles for the padded regions?
English
5
5
80
10.8K
Matthew Johnson
Matthew Johnson@SingularMattrix·
@vir_bhadeshiya @ezyang AIUI some power can be saved but I don't think XLA will be so smart as to save on memory transfers and compute just from a lax . select. In principle it could be, but the expression is too unstructured. A Pallas kernel is the way to be sure.
English
0
0
0
47
Viral Bhadeshiya
Viral Bhadeshiya@vir_bhadeshiya·
Actually no, you are not just skipping memory you're not just skipping memory transfers. You're skipping actual compute cycles too. jnp.where creates a device-side boolean mask (no host→device copy needed) and lax.select(mask, value, 0) gets fused by XLA into a single predicated/masked vector instruction on TPU. On TPU systolic arrays, the hardware literally gating off the multiply-add units for padded positions those ALUs stay idle, power drops, cycles are saved.
English
1
0
7
321
Matthew Johnson retweetledi
Jon Barron
Jon Barron@jon_barron·
Nano Banana Pro: "Generate a diagram of a two-layer neural network in the style of Stephen Biesty"
Jon Barron tweet media
English
25
66
745
266.2K
Matthew Johnson retweetledi
Doris Tsao
Doris Tsao@doristsao·
Unbelievable: the famed Berkeley Math Circle is being forced to shut down due to a bureaucratic requirement where a guest lecturer giving an hour long lesson needs to be officially fingerprinted. How is fingerprinting even still a thing in the 21st century? Chancellor Lyons @richlyons: can you see the absurdity of the situation and figure out a solution? dailycal.org/news/campus/ge…
English
33
80
760
273.8K
Matthew Johnson retweetledi
Matthew Johnson retweetledi
Percy Liang
Percy Liang@percyliang·
⛵Marin 32B Base (mantis) is done training! It is the best open-source base model (beating OLMo 2 32B Base) and it’s even close to the best comparably-sized open-weight base models, Gemma 3 27B PT and Qwen 2.5 32B Base. Ranking across 19 benchmarks:
Percy Liang tweet media
English
20
87
599
126.6K
Matthew Johnson retweetledi
Adam Paszke
Adam Paszke@apaszke·
Want to improve GPU compute/comms overlap? We just published a new short tutorial for you! A few small changes to the Pallas:MGPU matmul kernel is all it takes to turn it into an all-gather collective matmul that overlaps NVLINK comms with local compute: docs.jax.dev/en/latest/pall…
English
8
44
303
32.7K
Matthew Johnson retweetledi
Adam Paszke
Adam Paszke@apaszke·
Curious how to write SOTA performance Blackwell matmul kernels using MGPU? We just published a short step-by-step tutorial: docs.jax.dev/en/latest/pall… At each step, we show exactly what (small) changes are necessary to refine the kernel and the final kernel is just under 150 lines.
English
4
67
417
54.3K
Matthew Johnson retweetledi
Adam Paszke
Adam Paszke@apaszke·
@jeremyphoward Luckily we have alternatives :) #L100-L199" target="_blank" rel="nofollow noopener">github.com/jax-ml/jax/blo… Just 100 lines without leaving Python and SOTA performance
English
1
1
36
2K
Matthew Johnson
Matthew Johnson@SingularMattrix·
@profkuang Congratulations! The US is richer having you as a citizen.
English
1
0
12
2.3K
Kuang Xu
Kuang Xu@ProfKuang·
Today, I became a US citizen.
English
51
34
597
75.7K
Matthew Johnson retweetledi
Jacob Austin
Jacob Austin@jacobaustin132·
Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n
Jacob Austin tweet media
English
38
517
3.5K
402.3K
Matthew Johnson
Matthew Johnson@SingularMattrix·
@ezyang Added a comment to the gist with a single comm solution!
English
0
0
0
278
Edward Z. Yang
Edward Z. Yang@ezyang·
@SingularMattrix Haha, no reason, it's just how the tutorial code gets code into sharded state. You use two all gathers; can you do it in one comm?!
English
1
0
3
547