Smerity

13.2K posts

Smerity

@Smerity

ML x society. Founding Member of Technical Staff at Project Prometheus. Prev @midjourney, @SFResearch, @CommonCrawl. @Harvard '14, @Sydney_Uni '11. 🇦🇺 in SF.

San Francisco, CA Katılım Temmuz 2008

2.6K Takip Edilen32.2K Takipçiler

Sabitlenmiş Tweet

Smerity@Smerity·27 Kas

Introducing the SHA-RNN :) - Read alternative history as a research genre - Learn of the terrifying tokenization attack that leaves language models perplexed - Get near SotA results on enwik8 in hours on a lone GPU No Sesame Street or Transformers allowed. arxiv.org/abs/1911.11423

English

491

1.8K

Smerity@Smerity·28 Şub

@awnihannun You and your team definitely made MLX a surprising and powerful entrant onto the scene! Persistently impressed with all the work you've done with it :)

English

1.9K

Awni Hannun@awnihannun·28 Şub

Today is my last day at Apple. Building MLX with our amazing team and community has been an absolute pleasure. It's still early days for AI on Apple silicon. Apple makes the best consumer hardware on the planet. There's so much potential for it to be the leading platform for AI. And I'm confident MLX will continue to have a big role in that. To the future: MLX remains in the exceptionally capable hands of our team including @angeloskath, @zcbenz, @DiganiJagrit, @NasFilippova, @trebolloc (and others not on X). Follow them or @shshnkp for future updates.

English

260

2.2K

396.5K

Smerity@Smerity·15 Ara

@__ReJ__ @ID_AA_Carmack @mithro Reminds me of Kamil Rocki's FPGA RL env work: - FPGA gets 400Mhz 8080 CPU => 24k FPS - Mid tier FPGA fits 100 CPUs => 2.4M FPS Real silicon via @mithro's Wafer Space is doubly cool and opens entirely new possibilities. Test at speed, compile to real hw 🔥 medium.com/data-science/a…

English

190

ReJ 𓀨 Renaldas Zioma@__ReJ__·7 Ara

@ID_AA_Carmack Just realised, it could be useful for RL! Could put 50..80 copies on a die, running 50 MHz each. ROMs are tiny - would embed on the same MPW die. 3Gsamples per second for $100. Maybe 30 of them on a 180 nm with @mithro's Wafer Space. Should be around 1Gsample per second for $10.

English

563

John Carmack@ID_AA_Carmack·4 Oca

It would be an interesting demo-scene thing to make the modern equivalent of an Atari 2600 on an FPGA — no frame buffer, you have to race the HDMI scan out with just a single line buffer and a trivial amount of ram and rom. I got a new Atari 2600+ recently, and while it is neat that it can play original carts, seeing it boot up a whole OS to load Stella feels kind of tragic.

English

1.3K

137K

Smerity@Smerity·3 Ara

@Azaliamirh @annadgoldie @RicursiveAI Congrats and excited to see where you two take Ricursive - silicon is definitely more painful than it should be :)

English

312

Azalia Mirhoseini@Azaliamirh·2 Ara

Thrilled to share that @annadgoldie and I are launching @RicursiveAI, a frontier lab enabling recursive self-improvement through AIs that design their own chips. Our vision for transforming chip design began with AlphaChip, an AI for layout optimization used to design four generations of TPUs, data center CPUs, and smartphones. AlphaChip offered a glimpse into a future where AI designs the silicon that fuels it. Ricursive extends this vision to the entire chip stack, building AI that architects, verifies, and implements silicon, enabling models and chips to co-evolve in a tight loop. We sat down with WSJ’s @berber_jin1 to discuss Ricursive: wsj.com/tech/this-ai-s…

Ricursive Intelligence@RicursiveAI

Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com

English

125

136

1.5K

226.1K

Smerity retweetledi

Fran@furafuku·24 Haz

What the world needs is more morphological closing: one of the best tricks for organically smooth unions with clean, tightly-connected boundaries. Not only does it avoid the need for isosurface extraction, but it also works with arbitrary geometries, not just circles.

GIF

Samuel Timbó@io_sammt

Is the world ready for Metaballs?

English

101

1.6K

89.1K

Smerity retweetledi

Fran@furafuku·25 Haz

Many of you asked what software I used for that metaballs GIF. It's a custom UI made with @py5coding, a version of Processing for Python 3.9+. It’s incredibly handy for building visual experiments while taking full advantage of the Python ecosystem.

GIF

English

215

2.4K

115K

Smerity@Smerity·7 Kas

@soumithchintala Deep thanks for all of the work you've done thus far and excited to see what's next :)

English

957

Soumith Chintala@soumithchintala·6 Kas

Leaving Meta and PyTorch I'm stepping down from PyTorch and leaving Meta on November 17th. tl;dr: Didn't want to be doing PyTorch forever, seemed like the perfect time to transition right after I got back from a long leave and the project built itself around me. Eleven years at Meta. Nearly all my professional life. Making many friends for life. Almost eight years leading PyTorch, taking it from nothing to 90%+ adoption in AI. Walking away from this was one of the hardest things I've ever done. But I'm leaving with a full heart. PyTorch handles exascale training now. It powers foundation models that are redefining intelligence. It's in production at virtually every major AI company. It's taught in classrooms from MIT to rural India. The tools I dreamed about making accessible? They are. The barrier to entry I wanted to lower? It's almost gone. To be clear, there’s so much more to do. As long as AI evolves at a breakneck pace, PyTorch will continue to play catch up. Obsessing over the yet-to-come sometimes makes us forget how much we’ve already done. To everyone who built this with me—who believed research should be joyful, that tools should be elegant, that open source changes everything—thank you. This wasn't my journey. It was ours. What's next for me? Something small. Something new. Something I don't fully understand yet. Something uncomfortable. I could have moved to something else inside Meta. But I needed to know what's out there. I needed to do something small again. I couldn't live with the counterfactual regret of never trying something outside Meta. It's very hard to leave. I probably have one of the AI industry’s most leveraged seats, I lead the software layer that powers the entire AI industry. Every major AI company and hardware vendor are on a speed dial. This kind of power is really hard to give up. But curiosity ultimately won out in my head. Keep making AI delicious and accessible. I'll be watching. Probably filing issues. Definitely staying involved. Is PyTorch going to be okay? I don't want to be doing PyTorch forever. I don't want to be like Guido or Linus— bound to a single thing for decades. Last November, coinciding with the birth of my daughter, I started planning my exit with Aparna. My goal was to leave PyTorch in a good and stable place. By this August, during the second half of my parental leave, I knew: Edward, Suo, Alban, Greg, John, Joe and Jana were ready. The team faced hard people, product, technical and organizational problems and didn’t feel the need to lean back on me to solve these for them (unlike in the past). The product story they crafted for the PyTorch Conference was coherent—really coherent. The things I'd flagged red were turning healthy. The project didn't need me anymore. Unlike 2020-2022 (when I stepped down to go do robotics and came back when Lin, Dima and Dwarak left), I have strong confidence that this time PyTorch is truly resilient. The most aligned culture carriers of PyTorch – Greg, Alban, Ed, Jason and Joe are at the decision table now, and people with strong value alignment – Suo, John and Jana have joined them at the table. And there’s a long list of equally value-aligned people willing to sit at the table should any of these people leave. There are many little things that make up my confidence on the people – John worked on Julia and open-source for a very long time (in fact we hacked a Torch.jl in 2015), Suo has been the strongest systems builder and strategic partner I’ve had for the past two years, and Jana worked on resilient core systems for a very long time, I’ve had long technical and organizational discussions with her over the past few months that give me confidence. And the product lineup and execution in 2025 should be sufficient evidence for any remaining doubt. I’m confident that this band of PyTorchers are going to do exceptionally well. PyTorch might change in flavor because I no longer impose my own taste from the top, but I’m confident that the values are going to stay intact and the product is going to be awesome. My time at Meta The early years of FAIR were absolutely magical. I was part of a small family of absolutely brilliant people building state-of-the-art AI out in the open. From working on GANs with Emily Denton, Rob Fergus, Leon Bottou, Martin Arjovsky and the (now legendary) Alec Radford to building Starcraft bots with Gabriel Synnaeve, to building the first FAIR Cluster with Howard Mansell, to working on object detection with Adam Lerer and Piotr Dollar, to building PyTorch. It was more fun than I can describe in words. 2015 and 2016 were probably the most productive and professionally enjoyable years of my life. I’ll probably romanticize this period of my life forever. When I joined FAIR, I had massive impostor syndrome, and the first 3 months were very very difficult. I can’t credit Andrew Tulloch enough for being the most thoughtful, kind and welcoming mentor, without whom I wouldn’t have made it. I’m so damn bullish for Meta just from the fact that he’s back. --- My time on PyTorch was special. I loved every part of building it—designing it, managing it, being the PM, TL, comms lead, doc engineer, release engineer, squashing bugs, growth hacking, turning it into a coherent product with hundreds of people, transitioning it to industry stakeholdership – the whole nine yards. To the core PyTorch team at Meta: the engineers, researchers, open-source maintainers, docs writers, CI infrastructure folks, hardware partners, the community builders. To the hundreds more inside and outside Meta—thank you. You turned a library into a movement. There are too many people to credit and thank, but I can't not mention Adam Paszke, Sam Gross, Greg Chanan, Joe Spisak, Alban Desmaison, Edward Yang, Richard Zou, Tongzhou Wang, Francisco Massa, Luca Antiga, Andreas Köpf, Zach DeVito, Zeming Lin, Adam Lerer, Howard Mansell and Natalia Gimelshein. And Schrep. They made the launch happen. And so many more people became centrally important later: Lu Fang, Xiaodong Wang, Junjie Bai, Nikita Shulga, Horace He, Mark Saroufim, Jason Ansel, Dmytro Dzhulgakov, Yangqing Jia, Geeta Chauhan, Will Constable, Briah Hirsh, Jane Xu, Mario Lezcano, Piotr Balecki, Yinghai Lu, Less Wright, Andrew Tulloch, Bruce Lin, Woo Kim, Helen Suk, Chris Gottbrath, Peng Wu, Joe Isaacson, Eli Uriegas, Tristan Rice, Yanan Cao, Elias Ellison, Animesh Jain, Peter Noordhuis, Tianyu Liu, Yifu Wang, Lin Qiao and hundreds more. It’s criminal of me to not take the space to list out everyone else I should be mentioning here. PyTorch is nothing without its people ❤️. The most joyful moments of building PyTorch was meeting users eager to share their happiness, love and feedback. I remember a grad student coming to me at Neurips 2017, in a slurring emotional voice he said he’d been trying to make progress on his research for 3 years but within 3 months of using PyTorch he made so much progress that he was ready to graduate. That moment made it tangible that what we do matters, a lot, to a lot of people, even if you don't constantly hear from them. I do miss the intimacy of the PyTorch community, with a 300 person conference that felt like an extended family gathering, but I feel that’s a small price to pay considering the scale of impact PyTorch is truly having today – yes the Conference is now 3,000 people where market-moving deals get brokered, but it’s helping orders of magnitude more people to do their best AI work. I miss the intimacy, but I'm proud of that growth. --- To Mark Zuckerberg and Mike Schroepfer, who believed that open-sourcing is fundamentally important and is a sound business strategy. This is so hard to understand for most people within the course of business, but we’ve run lock-step on this strategy without ever having to discuss it. Without you two, neither FAIR nor PyTorch would’ve happened. And those mean so much to me. To Yann LeCun and Rob Fergus, for building the magical early FAIR that I so revere. To Aparna Ramani, a leader that I find so rare at Meta in her ability to hold a really high bar for the org, technically brilliant with the span to discuss deep infra systems and industry-strategy within the same conversation and for being an absolute execution-machine! I’ve learned so much from you. To Santosh, Kaushik, Delia, Oldham and Ben for being so welcoming to Infra. For someone coming over from FAIR with a wildly different culture, you all made me feel at home and made me part of the family, and thank you for that. To all my managers who've championed me through the PSC video game – Serkan, Howard, Jerome, Abhijit, Yoram, Joelle, Aparna and Damien – I owe you a lifetime of drinks. --- Signing off for now. —Soumith

English

491

573

10.9K

2.5M

Smerity retweetledi

Poe Zhao@poezhao0605·1 Kas

This week, two U.S. coding assistants—Cursor and Windsurf—were caught running on Chinese foundation models. Cursor’s “Composer” speaks Chinese when it thinks. Windsurf’s “SWE-1.5” traces back to Zhipu AI’s GLM. The real story here isn’t deception. Training foundation models from scratch costs tens of millions. Fine-tuning open-source models is the rational path. And Chinese models are now the best option. Qwen leads global downloads on Hugging Face. Chinese models dominate trending charts. Third-party benchmarks show they match or beat Western alternatives on reasoning and speed. Silicon Valley has spent years worrying about China “catching up” in AI. That framing is obsolete. Chinese open-source models aren’t just competitive—they’re infrastructure. Western developers build on them because they work, they’re free, and they’re good enough. The global AI stack is converging. Right now, much of it runs on code from Beijing.

English

104

233

2.3K

343.7K

Smerity retweetledi

Common Crawl Foundation@CommonCrawl·27 Eki

Common Crawl Foundation would like to thank Stanford HAI for the opportunity to present this week: "Preserving Humanity's Knowledge and Making it Accessible:" We appreciate Patrick Hynes and Professor Diyi Yang for hosting us! (link to followup post and PDF slides in replies)

English

Smerity@Smerity·23 Eyl

@unixpickle My baseline question/s would be: how many of the world's bathrooms are updated within the last N years, which stakeholders would need to be reached to do this (and how disconnected are they from cleaners), etc. Friction may be too much even for a good idea?

English

346

Alex Nichol@unixpickle·23 Eyl

If the whole "put a fake bug in the urinal" thing works, why aren't there fake bugs in every urinal now that it's years later? Seems to me like nudge theory is either BS or impractical for various reasons?

English

1.1K

Smerity@Smerity·13 Eyl

Brainstorming with an LLM that's glazing you a tad is helpful when you're under glazing by default 🤔

English

2.7K

Smerity@Smerity·28 Ağu

There's interplay between this and Gall's Law too, made all the more interesting that these patterns happen naturally but are also utilized biologically. "A complex system that works is invariably found to have evolved from a simple system that worked." en.wikipedia.org/wiki/Turing_pa…

English

1.1K

Smerity@Smerity·28 Ağu

Reaction diffusion systems are so simple in their construction yet so mesmerizing in their result. I'd have loved to have seen Alan Turing's next work after "The Chemical Basis of Morphogenesis" (1952). More fun with two dozen lines of Python plus a GPU than pre-calculator too!

English

2.1K

Smerity retweetledi

Tianqi Chen@tqchenml·26 Ağu

The new semester is here at CMU, excited to co-teach with @Tim_Dettmers , to offer our fun course again on "Build Your Mini-PyTorch (needle) from scratch, then build neural networks on top". (Deep Learning Systems) Check out dlsyscourse.org to learn more

English

506

37.3K

Smerity retweetledi

Jeff Dean@JeffDean·21 Ağu

AI efficiency is important. Today, Google is sharing a technical paper detailing our comprehensive methodology for measuring the environmental impact of Gemini inference. We estimate that the median Gemini Apps text prompt uses 0.24 watt-hours of energy (equivalent to watching an average TV for ~nine seconds), and consumes 0.26 milliliters of water (about five drops) — figures that are substantially lower than many public estimates. At the same time, our AI systems are becoming more efficient through research innovations and software and hardware efficiency improvements. From May 2024 to May 2025, the energy footprint of the median Gemini Apps text prompt dropped by 33x, and the total carbon footprint dropped by 44x, through a combination of model efficiency improvements, machine utilization improvements and additional clean energy procurement, all while delivering higher quality responses. See the blog or technical paper for more about our methodology and ongoing efforts. Blog: cloud.google.com/blog/products/… Link to detailed paper: services.google.com/fh/files/misc/…

English

152

798

4.1K

744.3K

Smerity@Smerity·14 Ağu

Even noted post training in the later part lol x.com/Smerity/status…

Smerity@Smerity

Most important to remember is that this is the raw language model. This is literally the equivalent of you hitting <next> on your predictive keyboard. It hasn't been tuned. LMs serve as the base layer of knowledge in many NLP tasks and a better LM almost always helps downstream!

English

1.9K

Smerity@Smerity·14 Ağu

Turns out this was a fairly succinct prediction for the next few years ;)

Smerity@Smerity

For those not in machine learning, these new results reinforce an underlying narrative of the language modeling community - that if the predictive text in your mobile had a supercomputer behind it, you could tab complete real work. You can see why that excites us in the field ^_^

English

190

24.3K

Smerity retweetledi

Awni Hannun@awnihannun·11 Tem

The new Kimi K2 1T model (4-bit quant) runs on 2 512GB M3 Ultras with mlx-lm and mx.distributed. 1 trillion params, at a speed that's actually quite usable:

Kimi.ai@Kimi_Moonshot

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence is more open and accessible than ever. We can't wait to see what you build! 🔌 API is here: platform.moonshot.ai - $0.15 / million input tokens (cache hit) - $0.60 / million input tokens (cache miss) - $2.50 / million output tokens 🔗 Tech blog: moonshotai.github.io/Kimi-K2/ 🔗 Weights & code: huggingface.co/moonshotai 🔗 Github: github.com/MoonshotAI/Kim… Try it now at Kimi.ai or via API!

English

168

1.7K

237.6K

Smerity retweetledi

Albert Gu@_albertgu·11 Tem

Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.

Sukjun (June) Hwang@sukjun_hwang

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

English

186

1.2K

230.2K

Smerity@Smerity·12 Haz

I'm at AMD's #AdvancingAI and hoping to run into new / familiar faces I've not caught up with :) Feel free to ping!

English

1.1K

Smerity retweetledi

Han Guo@HanGuo97·6 Haz

We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels

English

194

1.1K

263.8K

Keşfet

@awnihannun @angeloskath @zcbenz @DiganiJagrit @NasFilippova @trebolloc @shshnkp @__ReJ__