Smerity

13.2K posts

Smerity banner
Smerity

Smerity

@Smerity

ML x society. Founding Member of Technical Staff at Project Prometheus. Prev @midjourney, @SFResearch, @CommonCrawl. @Harvard '14, @Sydney_Uni '11. 🇦🇺 in SF.

San Francisco, CA Katılım Temmuz 2008
2.6K Takip Edilen32.2K Takipçiler
Sabitlenmiş Tweet
Smerity
Smerity@Smerity·
Introducing the SHA-RNN :) - Read alternative history as a research genre - Learn of the terrifying tokenization attack that leaves language models perplexed - Get near SotA results on enwik8 in hours on a lone GPU No Sesame Street or Transformers allowed. arxiv.org/abs/1911.11423
Smerity tweet mediaSmerity tweet mediaSmerity tweet media
English
58
491
1.8K
0
Smerity
Smerity@Smerity·
@awnihannun You and your team definitely made MLX a surprising and powerful entrant onto the scene! Persistently impressed with all the work you've done with it :)
English
1
1
4
1.9K
Awni Hannun
Awni Hannun@awnihannun·
Today is my last day at Apple. Building MLX with our amazing team and community has been an absolute pleasure. It's still early days for AI on Apple silicon. Apple makes the best consumer hardware on the planet. There's so much potential for it to be the leading platform for AI. And I'm confident MLX will continue to have a big role in that. To the future: MLX remains in the exceptionally capable hands of our team including @angeloskath, @zcbenz, @DiganiJagrit, @NasFilippova, @trebolloc (and others not on X). Follow them or @shshnkp for future updates.
Awni Hannun tweet media
English
260
94
2.2K
396.5K
Smerity
Smerity@Smerity·
@__ReJ__ @ID_AA_Carmack @mithro Reminds me of Kamil Rocki's FPGA RL env work: - FPGA gets 400Mhz 8080 CPU => 24k FPS - Mid tier FPGA fits 100 CPUs => 2.4M FPS Real silicon via @mithro's Wafer Space is doubly cool and opens entirely new possibilities. Test at speed, compile to real hw 🔥 medium.com/data-science/a…
English
1
0
2
190
ReJ 𓀨 Renaldas Zioma
@ID_AA_Carmack Just realised, it could be useful for RL! Could put 50..80 copies on a die, running 50 MHz each. ROMs are tiny - would embed on the same MPW die. 3Gsamples per second for $100. Maybe 30 of them on a 180 nm with @mithro's Wafer Space. Should be around 1Gsample per second for $10.
English
1
1
3
563
John Carmack
John Carmack@ID_AA_Carmack·
It would be an interesting demo-scene thing to make the modern equivalent of an Atari 2600 on an FPGA — no frame buffer, you have to race the HDMI scan out with just a single line buffer and a trivial amount of ram and rom. I got a new Atari 2600+ recently, and while it is neat that it can play original carts, seeing it boot up a whole OS to load Stella feels kind of tragic.
English
96
58
1.3K
137K
Azalia Mirhoseini
Azalia Mirhoseini@Azaliamirh·
Thrilled to share that @annadgoldie and I are launching @RicursiveAI, a frontier lab enabling recursive self-improvement through AIs that design their own chips. Our vision for transforming chip design began with AlphaChip, an AI for layout optimization used to design four generations of TPUs, data center CPUs, and smartphones. AlphaChip offered a glimpse into a future where AI designs the silicon that fuels it. Ricursive extends this vision to the entire chip stack, building AI that architects, verifies, and implements silicon, enabling models and chips to co-evolve in a tight loop. We sat down with WSJ’s @berber_jin1 to discuss Ricursive: wsj.com/tech/this-ai-s…
Ricursive Intelligence@RicursiveAI

Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com

English
125
136
1.5K
226.1K
Smerity retweetledi
Fran
Fran@furafuku·
What the world needs is more morphological closing: one of the best tricks for organically smooth unions with clean, tightly-connected boundaries. Not only does it avoid the need for isosurface extraction, but it also works with arbitrary geometries, not just circles.
GIF
Samuel Timbó@io_sammt

Is the world ready for Metaballs?

English
30
101
1.6K
89.1K
Smerity retweetledi
Fran
Fran@furafuku·
Many of you asked what software I used for that metaballs GIF. It's a custom UI made with @py5coding, a version of Processing for Python 3.9+. It’s incredibly handy for building visual experiments while taking full advantage of the Python ecosystem.
GIF
English
33
215
2.4K
115K
Smerity
Smerity@Smerity·
@soumithchintala Deep thanks for all of the work you've done thus far and excited to see what's next :)
English
0
0
4
957
Soumith Chintala
Soumith Chintala@soumithchintala·
Leaving Meta and PyTorch I'm stepping down from PyTorch and leaving Meta on November 17th. tl;dr: Didn't want to be doing PyTorch forever, seemed like the perfect time to transition right after I got back from a long leave and the project built itself around me. Eleven years at Meta. Nearly all my professional life. Making many friends for life. Almost eight years leading PyTorch, taking it from nothing to 90%+ adoption in AI. Walking away from this was one of the hardest things I've ever done. But I'm leaving with a full heart. PyTorch handles exascale training now. It powers foundation models that are redefining intelligence. It's in production at virtually every major AI company. It's taught in classrooms from MIT to rural India. The tools I dreamed about making accessible? They are. The barrier to entry I wanted to lower? It's almost gone. To be clear, there’s so much more to do. As long as AI evolves at a breakneck pace, PyTorch will continue to play catch up. Obsessing over the yet-to-come sometimes makes us forget how much we’ve already done. To everyone who built this with me—who believed research should be joyful, that tools should be elegant, that open source changes everything—thank you. This wasn't my journey. It was ours. What's next for me? Something small. Something new. Something I don't fully understand yet. Something uncomfortable. I could have moved to something else inside Meta. But I needed to know what's out there. I needed to do something small again. I couldn't live with the counterfactual regret of never trying something outside Meta. It's very hard to leave. I probably have one of the AI industry’s most leveraged seats, I lead the software layer that powers the entire AI industry. Every major AI company and hardware vendor are on a speed dial. This kind of power is really hard to give up. But curiosity ultimately won out in my head. Keep making AI delicious and accessible. I'll be watching. Probably filing issues. Definitely staying involved. Is PyTorch going to be okay? I don't want to be doing PyTorch forever. I don't want to be like Guido or Linus— bound to a single thing for decades. Last November, coinciding with the birth of my daughter, I started planning my exit with Aparna. My goal was to leave PyTorch in a good and stable place. By this August, during the second half of my parental leave, I knew: Edward, Suo, Alban, Greg, John, Joe and Jana were ready. The team faced hard people, product, technical and organizational problems and didn’t feel the need to lean back on me to solve these for them (unlike in the past). The product story they crafted for the PyTorch Conference was coherent—really coherent. The things I'd flagged red were turning healthy. The project didn't need me anymore. Unlike 2020-2022 (when I stepped down to go do robotics and came back when Lin, Dima and Dwarak left), I have strong confidence that this time PyTorch is truly resilient. The most aligned culture carriers of PyTorch – Greg, Alban, Ed, Jason and Joe are at the decision table now, and people with strong value alignment – Suo, John and Jana have joined them at the table. And there’s a long list of equally value-aligned people willing to sit at the table should any of these people leave. There are many little things that make up my confidence on the people – John worked on Julia and open-source for a very long time (in fact we hacked a Torch.jl in 2015), Suo has been the strongest systems builder and strategic partner I’ve had for the past two years, and Jana worked on resilient core systems for a very long time, I’ve had long technical and organizational discussions with her over the past few months that give me confidence. And the product lineup and execution in 2025 should be sufficient evidence for any remaining doubt. I’m confident that this band of PyTorchers are going to do exceptionally well. PyTorch might change in flavor because I no longer impose my own taste from the top, but I’m confident that the values are going to stay intact and the product is going to be awesome. My time at Meta The early years of FAIR were absolutely magical. I was part of a small family of absolutely brilliant people building state-of-the-art AI out in the open. From working on GANs with Emily Denton, Rob Fergus, Leon Bottou, Martin Arjovsky and the (now legendary) Alec Radford to building Starcraft bots with Gabriel Synnaeve, to building the first FAIR Cluster with Howard Mansell, to working on object detection with Adam Lerer and Piotr Dollar, to building PyTorch. It was more fun than I can describe in words. 2015 and 2016 were probably the most productive and professionally enjoyable years of my life. I’ll probably romanticize this period of my life forever. When I joined FAIR, I had massive impostor syndrome, and the first 3 months were very very difficult. I can’t credit Andrew Tulloch enough for being the most thoughtful, kind and welcoming mentor, without whom I wouldn’t have made it. I’m so damn bullish for Meta just from the fact that he’s back. --- My time on PyTorch was special. I loved every part of building it—designing it, managing it, being the PM, TL, comms lead, doc engineer, release engineer, squashing bugs, growth hacking, turning it into a coherent product with hundreds of people, transitioning it to industry stakeholdership – the whole nine yards. To the core PyTorch team at Meta: the engineers, researchers, open-source maintainers, docs writers, CI infrastructure folks, hardware partners, the community builders. To the hundreds more inside and outside Meta—thank you. You turned a library into a movement. There are too many people to credit and thank, but I can't not mention Adam Paszke, Sam Gross, Greg Chanan, Joe Spisak, Alban Desmaison, Edward Yang, Richard Zou, Tongzhou Wang, Francisco Massa, Luca Antiga, Andreas Köpf, Zach DeVito, Zeming Lin, Adam Lerer, Howard Mansell and Natalia Gimelshein. And Schrep. They made the launch happen. And so many more people became centrally important later: Lu Fang, Xiaodong Wang, Junjie Bai, Nikita Shulga, Horace He, Mark Saroufim, Jason Ansel, Dmytro Dzhulgakov, Yangqing Jia, Geeta Chauhan, Will Constable, Briah Hirsh, Jane Xu, Mario Lezcano, Piotr Balecki, Yinghai Lu, Less Wright, Andrew Tulloch, Bruce Lin, Woo Kim, Helen Suk, Chris Gottbrath, Peng Wu, Joe Isaacson, Eli Uriegas, Tristan Rice, Yanan Cao, Elias Ellison, Animesh Jain, Peter Noordhuis, Tianyu Liu, Yifu Wang, Lin Qiao and hundreds more. It’s criminal of me to not take the space to list out everyone else I should be mentioning here. PyTorch is nothing without its people ❤️. The most joyful moments of building PyTorch was meeting users eager to share their happiness, love and feedback. I remember a grad student coming to me at Neurips 2017, in a slurring emotional voice he said he’d been trying to make progress on his research for 3 years but within 3 months of using PyTorch he made so much progress that he was ready to graduate. That moment made it tangible that what we do matters, a lot, to a lot of people, even if you don't constantly hear from them. I do miss the intimacy of the PyTorch community, with a 300 person conference that felt like an extended family gathering, but I feel that’s a small price to pay considering the scale of impact PyTorch is truly having today – yes the Conference is now 3,000 people where market-moving deals get brokered, but it’s helping orders of magnitude more people to do their best AI work. I miss the intimacy, but I'm proud of that growth. --- To Mark Zuckerberg and Mike Schroepfer, who believed that open-sourcing is fundamentally important and is a sound business strategy. This is so hard to understand for most people within the course of business, but we’ve run lock-step on this strategy without ever having to discuss it. Without you two, neither FAIR nor PyTorch would’ve happened. And those mean so much to me. To Yann LeCun and Rob Fergus, for building the magical early FAIR that I so revere. To Aparna Ramani, a leader that I find so rare at Meta in her ability to hold a really high bar for the org, technically brilliant with the span to discuss deep infra systems and industry-strategy within the same conversation and for being an absolute execution-machine! I’ve learned so much from you. To Santosh, Kaushik, Delia, Oldham and Ben for being so welcoming to Infra. For someone coming over from FAIR with a wildly different culture, you all made me feel at home and made me part of the family, and thank you for that. To all my managers who've championed me through the PSC video game – Serkan, Howard, Jerome, Abhijit, Yoram, Joelle, Aparna and Damien – I owe you a lifetime of drinks. --- Signing off for now. —Soumith
Soumith Chintala tweet media
English
491
573
10.9K
2.5M
Smerity retweetledi
Poe Zhao
Poe Zhao@poezhao0605·
This week, two U.S. coding assistants—Cursor and Windsurf—were caught running on Chinese foundation models. Cursor’s “Composer” speaks Chinese when it thinks. Windsurf’s “SWE-1.5” traces back to Zhipu AI’s GLM. The real story here isn’t deception. Training foundation models from scratch costs tens of millions. Fine-tuning open-source models is the rational path. And Chinese models are now the best option. Qwen leads global downloads on Hugging Face. Chinese models dominate trending charts. Third-party benchmarks show they match or beat Western alternatives on reasoning and speed. Silicon Valley has spent years worrying about China “catching up” in AI. That framing is obsolete. Chinese open-source models aren’t just competitive—they’re infrastructure. Western developers build on them because they work, they’re free, and they’re good enough. The global AI stack is converging. Right now, much of it runs on code from Beijing.
Poe Zhao tweet media
English
104
233
2.3K
343.7K
Smerity retweetledi
Common Crawl Foundation
Common Crawl Foundation@CommonCrawl·
Common Crawl Foundation would like to thank Stanford HAI for the opportunity to present this week: "Preserving Humanity's Knowledge and Making it Accessible:" We appreciate Patrick Hynes and Professor Diyi Yang for hosting us! (link to followup post and PDF slides in replies)
Common Crawl Foundation tweet media
English
1
2
14
4K
Smerity
Smerity@Smerity·
@unixpickle My baseline question/s would be: how many of the world's bathrooms are updated within the last N years, which stakeholders would need to be reached to do this (and how disconnected are they from cleaners), etc. Friction may be too much even for a good idea?
English
0
0
2
346
Alex Nichol
Alex Nichol@unixpickle·
If the whole "put a fake bug in the urinal" thing works, why aren't there fake bugs in every urinal now that it's years later? Seems to me like nudge theory is either BS or impractical for various reasons?
English
1
0
2
1.1K
Smerity
Smerity@Smerity·
Brainstorming with an LLM that's glazing you a tad is helpful when you're under glazing by default 🤔
English
1
0
7
2.7K
Smerity
Smerity@Smerity·
There's interplay between this and Gall's Law too, made all the more interesting that these patterns happen naturally but are also utilized biologically. "A complex system that works is invariably found to have evolved from a simple system that worked." en.wikipedia.org/wiki/Turing_pa…
English
0
0
5
1.1K
Smerity
Smerity@Smerity·
Reaction diffusion systems are so simple in their construction yet so mesmerizing in their result. I'd have loved to have seen Alan Turing's next work after "The Chemical Basis of Morphogenesis" (1952). More fun with two dozen lines of Python plus a GPU than pre-calculator too!
English
1
1
20
2.1K
Smerity retweetledi
Tianqi Chen
Tianqi Chen@tqchenml·
The new semester is here at CMU, excited to co-teach with @Tim_Dettmers , to offer our fun course again on "Build Your Mini-PyTorch (needle) from scratch, then build neural networks on top". (Deep Learning Systems) Check out dlsyscourse.org to learn more
English
11
53
506
37.3K
Smerity retweetledi
Jeff Dean
Jeff Dean@JeffDean·
AI efficiency is important. Today, Google is sharing a technical paper detailing our comprehensive methodology for measuring the environmental impact of Gemini inference. We estimate that the median Gemini Apps text prompt uses 0.24 watt-hours of energy (equivalent to watching an average TV for ~nine seconds), and consumes 0.26 milliliters of water (about five drops) — figures that are substantially lower than many public estimates. At the same time, our AI systems are becoming more efficient through research innovations and software and hardware efficiency improvements. From May 2024 to May 2025, the energy footprint of the median Gemini Apps text prompt dropped by 33x, and the total carbon footprint dropped by 44x, through a combination of model efficiency improvements, machine utilization improvements and additional clean energy procurement, all while delivering higher quality responses. See the blog or technical paper for more about our methodology and ongoing efforts. Blog: cloud.google.com/blog/products/… Link to detailed paper: services.google.com/fh/files/misc/…
Jeff Dean tweet mediaJeff Dean tweet media
English
152
798
4.1K
744.3K
Smerity retweetledi
Awni Hannun
Awni Hannun@awnihannun·
The new Kimi K2 1T model (4-bit quant) runs on 2 512GB M3 Ultras with mlx-lm and mx.distributed. 1 trillion params, at a speed that's actually quite usable:
Kimi.ai@Kimi_Moonshot

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence is more open and accessible than ever. We can't wait to see what you build! 🔌 API is here: platform.moonshot.ai - $0.15 / million input tokens (cache hit) - $0.60 / million input tokens (cache miss) - $2.50 / million output tokens 🔗 Tech blog: moonshotai.github.io/Kimi-K2/ 🔗 Weights & code: huggingface.co/moonshotai 🔗 Github: github.com/MoonshotAI/Kim… Try it now at Kimi.ai or via API!

English
70
168
1.7K
237.6K
Smerity retweetledi
Albert Gu
Albert Gu@_albertgu·
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Albert Gu tweet media
Sukjun (June) Hwang@sukjun_hwang

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

English
61
186
1.2K
230.2K
Smerity
Smerity@Smerity·
I'm at AMD's #AdvancingAI and hoping to run into new / familiar faces I've not caught up with :) Feel free to ping!
Smerity tweet media
English
1
0
8
1.1K
Smerity retweetledi
Han Guo
Han Guo@HanGuo97·
We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels
Han Guo tweet media
English
16
194
1.1K
263.8K