Mark Handley

1.2K posts

Mark Handley

@MarkJHandley

Professor of Networked Systems at UCL and Networks at OpenAI

Katılım Mayıs 2012

114 Takip Edilen11.1K Takipçiler

Mark Handley retweetledi

Uday Ruddarraju@udayruddarraju·6 May

There is a lot of news about compute being the bottleneck for AI. There is less visibility into the engineering it takes to make large-scale compute actually work reliably. In my view, this is one of the most interesting computer science problems in the industry right now. It is not just about getting more GPUs. It is about making every layer of the system work: networking, scheduling, hardware health, storage, orchestration, reliability, observability, security, and the developer experience for researchers. This blog gives a rare preview into the depth of engineering happening across the stack at OpenAI, starting with MRC and supercomputer networking. We're excited to start sharing more about designing, building, and operating compute at planet scale. openai.com/index/mrc-supe… Join us: openai.com/careers/softwa…

OpenAI@OpenAI

AI supercomputers need a new kind of network to stay in sync at massive scale. OpenAI’s @markjhandley and @poyntingatgreg join @AndrewMayne to discuss what it takes to move data across record numbers of chips reliably and efficiently, the new Multipath Reliable Connection (MRC) networking protocol, and why it's available for the whole industry to use.

English

469

879.1K

Mark Handley@MarkJHandley·6 May

@PaulWil37983175 @OpenAI @Oracle @Microsoft Reducing tail latency was a key MRC design goal. Source-routed packet spraying essentially eliminates congestion delays caused by imperfect ECMP load balancing, so the whole network behaves like a single pool of bandwidth.

English

Paul Wilson@PaulWil37983175·6 May

@OpenAI @Oracle @Microsoft Huge — open-sourcing the protocol that's actually battle-tested at Abilene and Fairwater scale is a much bigger deal than people will realize today. Curious how MRC handles tail latency vs. RoCEv2 under heavy all-reduce traffic.

English

153

OpenAI@OpenAI·6 May

MRC is already deployed across all of OpenAI’s largest supercomputers that we use to train frontier models, including our site with @Oracle Cloud Infrastructure (OCI) in Abilene, Texas, and in @Microsoft’s Fairwater supercomputers. MRC is now available through the @OpenComputePrj for the entire industry to use and build on.

English

928

140.5K

Mark Handley@MarkJHandley·6 May

@WCelhen @OpenAI @Oracle @Microsoft Yes, MRC has been a long time in the making! Really exciting to be able to share our work with everyone now.

English

Wcabca@WCelhen·6 May

@OpenAI @Oracle @Microsoft Wow? I remember this diagram from Mark Handley's networking system lecture at UCL

English

409

Mark Handley retweetledi

OpenAI@OpenAI·6 May

We’ve partnered with @AMD, @Broadcom, @Intel, @Microsoft, and @NVIDIA, to release Multipath Reliable Connection (MRC), a new open networking protocol that helps large AI training clusters run faster and more reliably, with less wasted GPU time. openai.com/index/mrc-supe…

English

213

705

6.1K

1.1M

Mark Handley retweetledi

OpenAI@OpenAI·6 May

OpenAI@OpenAI

English

142

151

1.7K

1.1M

Mark Handley@MarkJHandley·6 May

Excited to be able to share what I've been working on for the last few years!

Sachin Katti@sk7037

Today we shared MRC (openai.com/index/mrc-supe…), a networking protocol developed with @Microsoft, @nvidia, @AMD, @Broadcom, and @intel to improve how large AI training systems move data and recover from failures. This innovation has come full circle for me personally, it was initiated by @OpenAI with my team at @intel then when I was leading the networking business there and it's great to see it come to life at scale! As training clusters scale, networking becomes a critical part of overall compute efficiency. It is not enough to add more capacity. You also need systems that keep jobs running reliably, use bandwidth well, and reduce wasted GPU time. MRC is one example of the kind of infrastructure work required to make frontier model training more efficient and more resilient. It reflects a broader view we have at OpenAI: progress in AI depends not just on better models, but on better compute systems across the stack.

English

588

Mark Handley@MarkJHandley·1 Tem

@paulg You do need to review the PRs and iterate on them when you don’t like its approach. Codex wrote almost all the actual code, so obviously did a lot of the thinking, but the part that mattered most, I did that. Is there a similar principle for writing English?

English

203

Mark Handley@MarkJHandley·1 Tem

@paulg Coding is thinking too. I’ve just used Codex to write a non-trivial distributed system. It’s very good - does the mechanical coding that I always found therapeutic very quickly and accurately, leaving me with all the big design decisions back to back. Which is pretty hard work.

English

458

Paul Graham@paulg·1 Tem

If you care about money or power, stay close to AI, because for the foreseeable future this will be the big source of change in both.

English

222

1.1K

12.2K

740.6K

Mark Handley@MarkJHandley·5 Haz

@Dominos_UK Why do you accept delivery orders if you don't have enough drivers to deliver? 75 mins and still waiting. Kingston

English

249

Mark Handley@MarkJHandley·17 Mar

@paulg @CcibChris Always loved the Victor (though as far as I know I'm not related to the company's founder). It's what you'd get if you asked an aircraft designer who grew up on 1930s Flash Gordon rockets to design a jet bomber.

English

585

Paul Graham@paulg·17 Mar

The perfect combination of futuristic and retro. Is there such a thing as jetpunk yet? Because this is it. (image via @CcibChris)

English

257

54.7K

Mark Handley@MarkJHandley·9 Mar

@bigbadrabby @PeterGlas6 @TrentTelenko On the other hand, 15% of the F35 is made in the UK. If the US disrupted support for UK F35s, they'd have problems with support for their own too.

English

170

Robert🏴󠁧󠁢󠁳󠁣󠁴󠁿@bigbadrabby·9 Mar

@PeterGlas6 @TrentTelenko The 2 British aircraft carriers were built around the F-35b. Nothing else I'm aware of can fly from them apart from helicopters. Makes them almost useless if thats true.

English

616

Trent Telenko@TrentTelenko·9 Mar

This is so damaging to the USA as a reputable arms supplier. Just kiss all future F-35 sales to Europe goodbye.

English

592

2.4K

16.5K

663.7K

Mark Handley@MarkJHandley·24 Kas

@paulg We've accumulated about 3000 books (the number stabilized about 10 years ago, when we started to donate books to keep things from getting out of control). It took me a long time to get over the feeling of guilt of buying books and not reading all of them.

English

412

Paul Graham@paulg·24 Kas

If you can afford it and have the space, there's nothing wrong with buying books that seem interesting but that you don't intend to read immediately.

English

374

482

8.9K

504.4K

Mark Handley@MarkJHandley·4 Eki

@NASASpaceflight He says "Again, not a case burn through." But that doesn't rule out a nozzle burn through, which is less likely to be catastrophic.

English

570

NSF - NASASpaceflight.com@NASASpaceflight·4 Eki

Fair play to Tory Bruno, who's been answering questions from anyone on X about the GEM 63XL nozzle deciding it didn't want to play ball today. A big note is he notes no burn-through. Also, fair play to the BE-4's for compensating. x.com/torybruno/with…

D Wise@dwisecinema

Vulcan during the "observation" seen during ULA's Cert-2 mission this morning. Watch how the rocket moves to adjust after the flash. 📸 - @NASASpaceflight 📺 - youtube.com/watch?v=ZPztD5…

English

486

52.9K

Mark Handley@MarkJHandley·4 Eki

@americascup @ineosbritannia @LouisVuitton Great race! All came down to a few knots speed difference crossing the start line. I'm an @ineosbritannia fan, but I really feel for Luna Rossa. They sailed an excellent race but Ineos had the tactical advantage off the start, and didn't put a foot wrong.

English

585

americascup@americascup·4 Eki

🇬🇧 @ineosbritannia win the @louisvuitton Cup! #LVCFinal #Day7

English

159

961

72.5K

Mark Handley retweetledi

Bethany Handley@Bethany1Handley·3 Eki

When I became a wheelchair user, I thought coastline trails had become inaccessible to me. Fortunately, I was wrong! On a three day adventure in North Wales, I explored some of the accessible sections of @WalesCoastPath for @countrylivinguk: countryliving.com/uk/wildlife/co…

English

2.4K

Mark Handley@MarkJHandley·2 Tem

@paulg Surely the right way up is whichever way up you like best?

English

518

Paul Graham@paulg·2 Tem

I sent this print to be framed. The framer couldn't tell which way was up, so he made two guesses, neither of which was right. (I couldn't tell either. I had to look it up.)

English

526

126.5K

Mark Handley retweetledi

James Ward@JamesWard73·28 Haz

this is the alcohol / politics / statistics crossover we’ve all been waiting for

English

19.6K

Mark Handley@MarkJHandley·11 May

@SimonOKing @BBCBreakfast Aurora over the River Thames at Kingston in West London.

English

1.3K

Simon King@SimonOKing·10 May

If you’re doing some #aurora watching tonight, send your pictures! Either here or via Weather Watchers. 📷🌌 Will look at them all and feature as many as I can on @BBCBreakfast in the morning 🤩

English

721

161

946

200.4K

Mark Handley@MarkJHandley·11 May

@garrett_wollman Generally, north america gets auroras further south than Europe because the geomagnetic pole is on your side of the actual north pole. I certainly never expected to see an aurora in London with the light polution here, but there you go!

English

567

Mark Handley@MarkJHandley·11 May

One more - I like the banding on this one