
Mark Handley
1.2K posts

Mark Handley
@MarkJHandley
Professor of Networked Systems at UCL and Networks at OpenAI







We’ve partnered with @AMD, @Broadcom, @Intel, @Microsoft, and @NVIDIA, to release Multipath Reliable Connection (MRC), a new open networking protocol that helps large AI training clusters run faster and more reliably, with less wasted GPU time. openai.com/index/mrc-supe…

Today we shared MRC (openai.com/index/mrc-supe…), a networking protocol developed with @Microsoft, @nvidia, @AMD, @Broadcom, and @intel to improve how large AI training systems move data and recover from failures. This innovation has come full circle for me personally, it was initiated by @OpenAI with my team at @intel then when I was leading the networking business there and it's great to see it come to life at scale! As training clusters scale, networking becomes a critical part of overall compute efficiency. It is not enough to add more capacity. You also need systems that keep jobs running reliably, use bandwidth well, and reduce wasted GPU time. MRC is one example of the kind of infrastructure work required to make frontier model training more efficient and more resilient. It reflects a broader view we have at OpenAI: progress in AI depends not just on better models, but on better compute systems across the stack.











Vulcan during the "observation" seen during ULA's Cert-2 mission this morning. Watch how the rocket moves to adjust after the flash. 📸 - @NASASpaceflight 📺 - youtube.com/watch?v=ZPztD5…
















