Bert Maher

433 posts

Bert Maher

Bert Maher

@tensorbert

I’m a software engineer building high-performance kernels and compilers at Anthropic! Previously at Facebook/Meta (PyTorch, HHVM, ReDex)

Washington, DC Inscrit le Aralık 2022
403 Abonnements2.8K Abonnés
Bert Maher
Bert Maher@tensorbert·
@headinthebox It seems like in theory reorgs could solve problems (centralizing decision making in one person reducing communication and consensus costs) but in practice it doesn’t seem like they ever do
English
0
0
0
50
Erik Meijer
Erik Meijer@headinthebox·
Textbook old school MSFT thinking: one more reorg will solve all your problems.
Mustafa Suleyman@mustafasuleyman

Technology and the future of our industry will be defined by two things: frontier models, and the products through which they are experienced. For some time, I’ve been thinking about how we best tackle these huge challenges, and today I’m excited to be evolving our structure at Microsoft AI, ensuring we’re positioned to succeed in both. I came to Microsoft with an overriding mission: to create Superintelligence that delivers a transformative, positive impact for millions of people. This requires us to build frontier models, at scale, pushing the boundaries of what’s possible. Everything else follows from this. It's the foundation for our future as a company. With our ambitious, long-term frontier scale compute roadmap locked, we now have everything we need to build truly SOTA models. The next phase of this plan is to restructure our organization to enable me to focus all my energy on our Superintelligence efforts and be able to deliver world class models for Microsoft over the next 5 years. These models will enable us to build enterprise tuned lineages that help improve all our products across the company. They’ll also enable us to deliver the COGS efficiencies necessary to be able to serve AI workloads at the immense scale required in the coming years. Achieving all this will be a huge challenge, and I’m committing everything we have – and I have personally – to make it happen. To that end, I’ve been working hard with other leaders in the background for a while now to define a strategy to unify Copilot by bringing together the Consumer and Commercial efforts as one. We all know this makes sense. Every user – whether at home or at work – will be able to enjoy the full benefit of what we are all building. Today, we’re combining these organizations into a single, unified Copilot org. @JacobAndreou has demonstrated himself to be an outstanding leader for the product experience and clearly has the product instincts, the operational range, and the conviction to make Copilot a great success. Jacob will retain a dotted line to me, and I’ll stay directly involved in much of the day-to-day operation of MAI and supporting Jacob to drive all areas of product strategy. To ensure that the models we build and the products we ship are mutually reinforcing, we are establishing a Copilot Leadership Team that includes me, Jacob, Charles Lamanna, Perry Clarke, and Ryan Roslansky. This will enable us to focus our brand strategy, our product roadmap, our models and our core infrastructure as one to deliver the best experiences possible for all our users. Thank you to the team for everything you’ve done over the last few years. I know how hard everyone has been pushing to help the company adapt to this new era. We really do have an incredible opportunity to redefine Microsoft for this agentic revolution. Let’s keep driving hard in this next chapter! blogs.microsoft.com/blog/2026/03/1…

English
4
0
39
8.7K
difficultyang
difficultyang@difficultyang·
claude gets so excited when analyzing pytorch profiler traces
English
3
1
38
3.3K
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Imagine they release Gemini 2.75 instead lol
English
63
12
1.1K
64.2K
Bert Maher retweeté
Paul Graham
Paul Graham@paulg·
@fermatslibrary One of the most common flaws of math textbooks is that they present only the logic, without the intuition. They give you the later, cleaned up version of the idea, which hides the way it was discovered.
English
123
249
3K
136.5K
Bert Maher
Bert Maher@tensorbert·
This is all true, but Soumith is also one of the most brilliant strategic thinkers in the world. Some of us just fail a lot, dust ourselves off, and keep hacking the next day ☺️
Deedy@deedydas

If you feel like giving up, you must read this never-before-shared story of the creator of PyTorch and ex-VP at Meta, Soumith Chintala. > from hyderabad public school, but bad at math > goes to a "tier 2" college in India, VIT in Vellore > rejected from all 12 universities for US masters despite 1420 on the GRE > fuckit.jpg > goes to the US anyway on a J-1 visa to CMU with no plan > applies for masters (again) to 15 universities > rejected from all except USC and with late admissions, NYU in 2010 > finds this guy called Yann LeCun (before he was famous) > starts getting into open source > rejected from all jobs including DeepMind > only job is Amazon as test engineer > his PhD mentor helps him get a job at a small startup (MuseAmi) > rejected from DeepMind > couldn't get H-1B because of J-1 home return issue; gets waiver through months of approval with USCIS and US State Dept > very low on confidence > In 2011/12 builds one of the fastest AI inference engines on phones > rejected from DeepMind > emailed Yann again and joins FAIR because of Torch7 open-source work > scrapes through bootcamp at Facebook, struggling on an HBase task > L8/L9 engineers at Facebook struggle to get ImageNet working > figures out numerics / hyperparam issue as an L4 > first big win! > FAIR goes well, runs 3 person torch7 team and co-creates PyTorch > because of politics, management wants to shut down PyTorch > cries-at-bar.jpg, literally > eventually some people save PyTorch and it launches in 2017 > gets a EB-1 green card! > the rest is history... Think about that. He went to a tier 2 college. Was rejected from all Masters programs 2x. Rejected from every single job except Amazon test engineering. Rejected from DeepMind 3x. Nearly had his baby project shut down. Struggled with visa issues. After 12 years of failures (2005-17), he eventually rose to became a VP at Meta one of the most influential people in AI! Soumith's story is one of resilience and he's living proof that no matter how down in the dumps you are, there's always hope.

English
0
1
64
7.2K
Bert Maher
Bert Maher@tensorbert·
This got me thinking that both int and FP math is “emulated” via a pretty complex set of transistors. I wonder how many gates/transistors it takes to implement an int8 fma versus an fp8, e4m3 fma
Elon Musk@elonmusk

@CernBasher As the number of bits drops, the difference between floating point and integer decreases until they are the same thing at 1 bit. “Floating point” is not real. It is emulated with 2 integers and a lot of complexity.

English
1
1
5
1.1K
Bert Maher
Bert Maher@tensorbert·
@tqchenml Fair point! My first guess would be “this path has regressed” but it’s also true that expectations are high, the hw is fast, and 10us can actually be substantial (depending on the work). If it’s the latter that’s rough, triton.jit is decently fast (need c++ launch maybe)
English
0
0
2
402
Tianqi Chen
Tianqi Chen@tqchenml·
Two things worth noting, GPUs are likely getting faster. It is also good to ask about the upper bound possible. recently we get DSL host API overhead(excluding driver launch) down to 0.4us and around 1-2us(including driver launch) via tvm-ffi github.com/apache/tvm-ffi/
Bert Maher@tensorbert

I’ve heard this complaint from a couple people recently, and I’m surprised because we optimized the launch path like a year ago and got it down to ~10us. There’s a now closed GitHub issue I filed with a microbenchmark - someone should run it, profile, and bring it down

English
2
6
71
15.5K
Bert Maher
Bert Maher@tensorbert·
I’ve heard this complaint from a couple people recently, and I’m surprised because we optimized the launch path like a year ago and got it down to ~10us. There’s a now closed GitHub issue I filed with a microbenchmark - someone should run it, profile, and bring it down
maharshi@maharshii

why is triton’s kernel launch cpu overhead so freaking high? the actual kernel takes 10x less execution time than to launch it and i can’t use cuda graphs because the shapes are dynamic.

English
3
0
17
17.7K
Bert Maher
Bert Maher@tensorbert·
@soumithchintala ❤️ It was great to have the chance to work with you, Soumith. I can’t wait to see what you do next
English
0
0
0
218
Soumith Chintala
Soumith Chintala@soumithchintala·
Leaving Meta and PyTorch I'm stepping down from PyTorch and leaving Meta on November 17th. tl;dr: Didn't want to be doing PyTorch forever, seemed like the perfect time to transition right after I got back from a long leave and the project built itself around me. Eleven years at Meta. Nearly all my professional life. Making many friends for life. Almost eight years leading PyTorch, taking it from nothing to 90%+ adoption in AI. Walking away from this was one of the hardest things I've ever done. But I'm leaving with a full heart. PyTorch handles exascale training now. It powers foundation models that are redefining intelligence. It's in production at virtually every major AI company. It's taught in classrooms from MIT to rural India. The tools I dreamed about making accessible? They are. The barrier to entry I wanted to lower? It's almost gone. To be clear, there’s so much more to do. As long as AI evolves at a breakneck pace, PyTorch will continue to play catch up. Obsessing over the yet-to-come sometimes makes us forget how much we’ve already done. To everyone who built this with me—who believed research should be joyful, that tools should be elegant, that open source changes everything—thank you. This wasn't my journey. It was ours. What's next for me? Something small. Something new. Something I don't fully understand yet. Something uncomfortable. I could have moved to something else inside Meta. But I needed to know what's out there. I needed to do something small again. I couldn't live with the counterfactual regret of never trying something outside Meta. It's very hard to leave. I probably have one of the AI industry’s most leveraged seats, I lead the software layer that powers the entire AI industry. Every major AI company and hardware vendor are on a speed dial. This kind of power is really hard to give up. But curiosity ultimately won out in my head. Keep making AI delicious and accessible. I'll be watching. Probably filing issues. Definitely staying involved. Is PyTorch going to be okay? I don't want to be doing PyTorch forever. I don't want to be like Guido or Linus— bound to a single thing for decades. Last November, coinciding with the birth of my daughter, I started planning my exit with Aparna. My goal was to leave PyTorch in a good and stable place. By this August, during the second half of my parental leave, I knew: Edward, Suo, Alban, Greg, John, Joe and Jana were ready. The team faced hard people, product, technical and organizational problems and didn’t feel the need to lean back on me to solve these for them (unlike in the past). The product story they crafted for the PyTorch Conference was coherent—really coherent. The things I'd flagged red were turning healthy. The project didn't need me anymore. Unlike 2020-2022 (when I stepped down to go do robotics and came back when Lin, Dima and Dwarak left), I have strong confidence that this time PyTorch is truly resilient. The most aligned culture carriers of PyTorch – Greg, Alban, Ed, Jason and Joe are at the decision table now, and people with strong value alignment – Suo, John and Jana have joined them at the table. And there’s a long list of equally value-aligned people willing to sit at the table should any of these people leave. There are many little things that make up my confidence on the people – John worked on Julia and open-source for a very long time (in fact we hacked a Torch.jl in 2015), Suo has been the strongest systems builder and strategic partner I’ve had for the past two years, and Jana worked on resilient core systems for a very long time, I’ve had long technical and organizational discussions with her over the past few months that give me confidence. And the product lineup and execution in 2025 should be sufficient evidence for any remaining doubt. I’m confident that this band of PyTorchers are going to do exceptionally well. PyTorch might change in flavor because I no longer impose my own taste from the top, but I’m confident that the values are going to stay intact and the product is going to be awesome. My time at Meta The early years of FAIR were absolutely magical. I was part of a small family of absolutely brilliant people building state-of-the-art AI out in the open. From working on GANs with Emily Denton, Rob Fergus, Leon Bottou, Martin Arjovsky and the (now legendary) Alec Radford to building Starcraft bots with Gabriel Synnaeve, to building the first FAIR Cluster with Howard Mansell, to working on object detection with Adam Lerer and Piotr Dollar, to building PyTorch. It was more fun than I can describe in words. 2015 and 2016 were probably the most productive and professionally enjoyable years of my life. I’ll probably romanticize this period of my life forever. When I joined FAIR, I had massive impostor syndrome, and the first 3 months were very very difficult. I can’t credit Andrew Tulloch enough for being the most thoughtful, kind and welcoming mentor, without whom I wouldn’t have made it. I’m so damn bullish for Meta just from the fact that he’s back. --- My time on PyTorch was special. I loved every part of building it—designing it, managing it, being the PM, TL, comms lead, doc engineer, release engineer, squashing bugs, growth hacking, turning it into a coherent product with hundreds of people, transitioning it to industry stakeholdership – the whole nine yards. To the core PyTorch team at Meta: the engineers, researchers, open-source maintainers, docs writers, CI infrastructure folks, hardware partners, the community builders. To the hundreds more inside and outside Meta—thank you. You turned a library into a movement. There are too many people to credit and thank, but I can't not mention Adam Paszke, Sam Gross, Greg Chanan, Joe Spisak, Alban Desmaison, Edward Yang, Richard Zou, Tongzhou Wang, Francisco Massa, Luca Antiga, Andreas Köpf, Zach DeVito, Zeming Lin, Adam Lerer, Howard Mansell and Natalia Gimelshein. And Schrep. They made the launch happen. And so many more people became centrally important later: Lu Fang, Xiaodong Wang, Junjie Bai, Nikita Shulga, Horace He, Mark Saroufim, Jason Ansel, Dmytro Dzhulgakov, Yangqing Jia, Geeta Chauhan, Will Constable, Briah Hirsh, Jane Xu, Mario Lezcano, Piotr Balecki, Yinghai Lu, Less Wright, Andrew Tulloch, Bruce Lin, Woo Kim, Helen Suk, Chris Gottbrath, Peng Wu, Joe Isaacson, Eli Uriegas, Tristan Rice, Yanan Cao, Elias Ellison, Animesh Jain, Peter Noordhuis, Tianyu Liu, Yifu Wang, Lin Qiao and hundreds more. It’s criminal of me to not take the space to list out everyone else I should be mentioning here. PyTorch is nothing without its people ❤️. The most joyful moments of building PyTorch was meeting users eager to share their happiness, love and feedback. I remember a grad student coming to me at Neurips 2017, in a slurring emotional voice he said he’d been trying to make progress on his research for 3 years but within 3 months of using PyTorch he made so much progress that he was ready to graduate. That moment made it tangible that what we do matters, a lot, to a lot of people, even if you don't constantly hear from them. I do miss the intimacy of the PyTorch community, with a 300 person conference that felt like an extended family gathering, but I feel that’s a small price to pay considering the scale of impact PyTorch is truly having today – yes the Conference is now 3,000 people where market-moving deals get brokered, but it’s helping orders of magnitude more people to do their best AI work. I miss the intimacy, but I'm proud of that growth. --- To Mark Zuckerberg and Mike Schroepfer, who believed that open-sourcing is fundamentally important and is a sound business strategy. This is so hard to understand for most people within the course of business, but we’ve run lock-step on this strategy without ever having to discuss it. Without you two, neither FAIR nor PyTorch would’ve happened. And those mean so much to me. To Yann LeCun and Rob Fergus, for building the magical early FAIR that I so revere. To Aparna Ramani, a leader that I find so rare at Meta in her ability to hold a really high bar for the org, technically brilliant with the span to discuss deep infra systems and industry-strategy within the same conversation and for being an absolute execution-machine! I’ve learned so much from you. To Santosh, Kaushik, Delia, Oldham and Ben for being so welcoming to Infra. For someone coming over from FAIR with a wildly different culture, you all made me feel at home and made me part of the family, and thank you for that. To all my managers who've championed me through the PSC video game – Serkan, Howard, Jerome, Abhijit, Yoram, Joelle, Aparna and Damien – I owe you a lifetime of drinks. --- Signing off for now. —Soumith
Soumith Chintala tweet media
English
491
573
10.8K
2.5M
Bert Maher
Bert Maher@tensorbert·
@headinthebox Might it be more tractable to verify that two implementations match, than to come up with an optimized implementation of a simpler one?
English
1
0
2
374
Erik Meijer
Erik Meijer@headinthebox·
Perfect example of why I got disappointed with program verification. Too often the post-condition is just a pure functional version of the imperative implementation. Why bother even writing the imperative code, assuming a decent compiler will optimize the former into the latter under the covers for you.
Ilya Sergey@ilyasergey

Spent the last couple of days porting my program verification class from Dafny to Lean via Loom/Velvet, and it just works! Whenever the SMT solver can’t fully prove a program correct, Lean’s aesop and grind take care of the remaining goals.

English
14
12
181
34.7K
Bert Maher
Bert Maher@tensorbert·
It would be kind of cool if torch.compile could be used as a context manager, like: ``` some_custom_kernels() with torch.compile(): # do a bunch of easy pointwise stuff more_custom_kernels() ```
English
0
0
6
489
Bert Maher
Bert Maher@tensorbert·
@ScottWolchok @marksaroufim Sometimes I think the the rows-vs-columns framing is kind of unhelpful. I sometimes think about matmul with rhs transposed, so you have an [m,k] matrix and and [n,k] matrix, and you end up with an [m,n] of all the dot products over k. (Which is kind of what nn.Linear does)
English
1
0
2
61
Scott Wolchok
Scott Wolchok@ScottWolchok·
@marksaroufim @tensorbert "there is a cube filled with products, yes we sum those along exactly one dimension of the cube, no we are not just summing all the things" is a lot more helpful than the high-school "OK you go across the rows of the one and down the columns of the other, who knows why"
English
1
0
3
55
Bert Maher retweeté
Claude
Claude@claudeai·
Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.
Claude tweet media
English
1.1K
3.2K
20.2K
5M
Bert Maher
Bert Maher@tensorbert·
@matt_dz @davorVDR Oh man can’t believe I forgot Helion in the list. And it compiles to triton (or at least did last I looked) so it’s turtles all the way down
English
1
0
2
138
Bert Maher
Bert Maher@tensorbert·
lol, there is quite the explosion of kernel DSLs lately (triton, tilelang, gluon, TLX, cuteDSL, cuTile, …) And honestly as much as I love TLX and want it to succeed, I think the next big kernel programming language might be… natural, human language
SzymonOzog@SzymonOzog_

Just one more DSL bro. I promise bro just one more DSL and we'll fix hardware adoption. It's just a better DSL bro. Please just one more. One more DSL and we'll port all the kernels. I just need one more DSL

English
7
9
114
17.1K
Bert Maher
Bert Maher@tensorbert·
@vinodg Indeed - but I think that language can increasingly become more intuitive and flexible
English
0
0
0
128