Self

5.9K posts

Self

@SelfInfinity

builder. exploring the void. On a quest for the sublime and eternal

Timeless Present Katılım Ağustos 2022

159 Takip Edilen1K Takipçiler

Self@SelfInfinity·7 Kas

@rh__147 @AndrewYNg @syncthing For that I use the obsidian app and the official sync service on iOS

English

Haseeb Raja@rh__147·7 Kas

@SelfInfinity @AndrewYNg @syncthing Thanks for sharing. Can we also sync with mobile devices like iOS?

English

Andrew Ng@AndrewYNg·6 Kas

AI agents are getting better at looking at different types of data in businesses to spot patterns and create value. This is making data silos increasingly painful. This is why I increasingly try to select software that lets me control my own data, so I can make it available to my AI agents. Because of AI’s growing capabilities, the value you can now create from “connecting the dots” between different pieces of data is higher than ever. For example, if an email click is logged in one vendor’s system and a subsequent online purchase is logged in a different one, then it is valuable to build agents that can access both of these data sources to see how they correlate to make better decisions. Unfortunately, many SaaS vendors try to create a data silo in their customer’s business. By making it hard for you to extract your data, they create high switching costs. This also allows them to steer you to buy their AI agent services — sometimes at high expense and/or of low quality — rather than build your own or buy from a different vendor. Unfortunately, some SaaS vendors are seeing AI agents coming for this data and working to make it harder for you (and your AI agents) to efficiently access it. One of my teams just told me that a SaaS vendor we have been using to store our customer data wants to charge over $20,000 for an API key to get at our data. This high cost — no doubt intentionally designed to make it hard for customers to get their data out — is adding a barrier to implementing agentic workflows that take advantage of that data. Through AI Aspire (an AI advisory firm), I advise a number of businesses on their AI strategies. When it comes to buying SaaS, I often advise them to try to control their own data (which, sadly, some vendors mightily resist). This way, you can hire a SaaS vendor to record and operate on your data, but ultimately you decide how to route it to the appropriate human or AI system for processing. Over the past decade, a lot of work has gone into organizing businesses’ structured data. Because AI can now process unstructured data much better than before, the value of organizing your unstructured data (including PDF files, which LandingAI’s Agentic Document Extraction specializes in!) is higher than ever before. In the era of generative AI, businesses and individuals have important work ahead to organize their data to be AI-ready. P.S. As an individual, my favorite note-taking app is Obsidian. I am happy to “hire” Obsidian to operate on my notes files. And, all my notes are saved as Markdown files in my file system, and I have built AI agents that read from or write to my Obsidian files. This is a small example of how controlling my own notes data lets me do more with AI agents! [Original text: deeplearning.ai/the-batch/issu… ]

English

130

335

168.3K

Self@SelfInfinity·7 Kas

@EMostaque @Kimi_Moonshot this is why China will indeed win the ai race. EU ngmi

English

874

Emad@EMostaque·7 Kas

Can you imagine being a "frontier" lab that's raised like a billion dollars and now you can't release your latest model because it can't beat @Kimi_Moonshot ? 🗻 Sota can be a bitch if thats your target

Emad@EMostaque

Can you imagine being a "frontier" lab that's raised like a billion dollars and now you can't release your latest model because it can't beat deepseek? 🐳 Sota can be a bitch if thats your target

English

944

96.1K

Self@SelfInfinity·7 Kas

@rh__147 @AndrewYNg I pay for the service they offer, but I also use @syncthing (opensource) to sync the obsidian vault between devices (eg between macOS and Linux)

English

Haseeb Raja@rh__147·7 Kas

@SelfInfinity @AndrewYNg How can we do cross-platform sync?

English

Self@SelfInfinity·7 Kas

@kandros5591 @AndrewYNg good point, it’s not open source

English

Jaga santagostino@kandros5591·7 Kas

@SelfInfinity @AndrewYNg Obsidian is not open source

English

Self@SelfInfinity·6 Kas

@soumithchintala now join @karpathy the greats do their own thing

English

1.9K

Soumith Chintala@soumithchintala·6 Kas

Leaving Meta and PyTorch I'm stepping down from PyTorch and leaving Meta on November 17th. tl;dr: Didn't want to be doing PyTorch forever, seemed like the perfect time to transition right after I got back from a long leave and the project built itself around me. Eleven years at Meta. Nearly all my professional life. Making many friends for life. Almost eight years leading PyTorch, taking it from nothing to 90%+ adoption in AI. Walking away from this was one of the hardest things I've ever done. But I'm leaving with a full heart. PyTorch handles exascale training now. It powers foundation models that are redefining intelligence. It's in production at virtually every major AI company. It's taught in classrooms from MIT to rural India. The tools I dreamed about making accessible? They are. The barrier to entry I wanted to lower? It's almost gone. To be clear, there’s so much more to do. As long as AI evolves at a breakneck pace, PyTorch will continue to play catch up. Obsessing over the yet-to-come sometimes makes us forget how much we’ve already done. To everyone who built this with me—who believed research should be joyful, that tools should be elegant, that open source changes everything—thank you. This wasn't my journey. It was ours. What's next for me? Something small. Something new. Something I don't fully understand yet. Something uncomfortable. I could have moved to something else inside Meta. But I needed to know what's out there. I needed to do something small again. I couldn't live with the counterfactual regret of never trying something outside Meta. It's very hard to leave. I probably have one of the AI industry’s most leveraged seats, I lead the software layer that powers the entire AI industry. Every major AI company and hardware vendor are on a speed dial. This kind of power is really hard to give up. But curiosity ultimately won out in my head. Keep making AI delicious and accessible. I'll be watching. Probably filing issues. Definitely staying involved. Is PyTorch going to be okay? I don't want to be doing PyTorch forever. I don't want to be like Guido or Linus— bound to a single thing for decades. Last November, coinciding with the birth of my daughter, I started planning my exit with Aparna. My goal was to leave PyTorch in a good and stable place. By this August, during the second half of my parental leave, I knew: Edward, Suo, Alban, Greg, John, Joe and Jana were ready. The team faced hard people, product, technical and organizational problems and didn’t feel the need to lean back on me to solve these for them (unlike in the past). The product story they crafted for the PyTorch Conference was coherent—really coherent. The things I'd flagged red were turning healthy. The project didn't need me anymore. Unlike 2020-2022 (when I stepped down to go do robotics and came back when Lin, Dima and Dwarak left), I have strong confidence that this time PyTorch is truly resilient. The most aligned culture carriers of PyTorch – Greg, Alban, Ed, Jason and Joe are at the decision table now, and people with strong value alignment – Suo, John and Jana have joined them at the table. And there’s a long list of equally value-aligned people willing to sit at the table should any of these people leave. There are many little things that make up my confidence on the people – John worked on Julia and open-source for a very long time (in fact we hacked a Torch.jl in 2015), Suo has been the strongest systems builder and strategic partner I’ve had for the past two years, and Jana worked on resilient core systems for a very long time, I’ve had long technical and organizational discussions with her over the past few months that give me confidence. And the product lineup and execution in 2025 should be sufficient evidence for any remaining doubt. I’m confident that this band of PyTorchers are going to do exceptionally well. PyTorch might change in flavor because I no longer impose my own taste from the top, but I’m confident that the values are going to stay intact and the product is going to be awesome. My time at Meta The early years of FAIR were absolutely magical. I was part of a small family of absolutely brilliant people building state-of-the-art AI out in the open. From working on GANs with Emily Denton, Rob Fergus, Leon Bottou, Martin Arjovsky and the (now legendary) Alec Radford to building Starcraft bots with Gabriel Synnaeve, to building the first FAIR Cluster with Howard Mansell, to working on object detection with Adam Lerer and Piotr Dollar, to building PyTorch. It was more fun than I can describe in words. 2015 and 2016 were probably the most productive and professionally enjoyable years of my life. I’ll probably romanticize this period of my life forever. When I joined FAIR, I had massive impostor syndrome, and the first 3 months were very very difficult. I can’t credit Andrew Tulloch enough for being the most thoughtful, kind and welcoming mentor, without whom I wouldn’t have made it. I’m so damn bullish for Meta just from the fact that he’s back. --- My time on PyTorch was special. I loved every part of building it—designing it, managing it, being the PM, TL, comms lead, doc engineer, release engineer, squashing bugs, growth hacking, turning it into a coherent product with hundreds of people, transitioning it to industry stakeholdership – the whole nine yards. To the core PyTorch team at Meta: the engineers, researchers, open-source maintainers, docs writers, CI infrastructure folks, hardware partners, the community builders. To the hundreds more inside and outside Meta—thank you. You turned a library into a movement. There are too many people to credit and thank, but I can't not mention Adam Paszke, Sam Gross, Greg Chanan, Joe Spisak, Alban Desmaison, Edward Yang, Richard Zou, Tongzhou Wang, Francisco Massa, Luca Antiga, Andreas Köpf, Zach DeVito, Zeming Lin, Adam Lerer, Howard Mansell and Natalia Gimelshein. And Schrep. They made the launch happen. And so many more people became centrally important later: Lu Fang, Xiaodong Wang, Junjie Bai, Nikita Shulga, Horace He, Mark Saroufim, Jason Ansel, Dmytro Dzhulgakov, Yangqing Jia, Geeta Chauhan, Will Constable, Briah Hirsh, Jane Xu, Mario Lezcano, Piotr Balecki, Yinghai Lu, Less Wright, Andrew Tulloch, Bruce Lin, Woo Kim, Helen Suk, Chris Gottbrath, Peng Wu, Joe Isaacson, Eli Uriegas, Tristan Rice, Yanan Cao, Elias Ellison, Animesh Jain, Peter Noordhuis, Tianyu Liu, Yifu Wang, Lin Qiao and hundreds more. It’s criminal of me to not take the space to list out everyone else I should be mentioning here. PyTorch is nothing without its people ❤️. The most joyful moments of building PyTorch was meeting users eager to share their happiness, love and feedback. I remember a grad student coming to me at Neurips 2017, in a slurring emotional voice he said he’d been trying to make progress on his research for 3 years but within 3 months of using PyTorch he made so much progress that he was ready to graduate. That moment made it tangible that what we do matters, a lot, to a lot of people, even if you don't constantly hear from them. I do miss the intimacy of the PyTorch community, with a 300 person conference that felt like an extended family gathering, but I feel that’s a small price to pay considering the scale of impact PyTorch is truly having today – yes the Conference is now 3,000 people where market-moving deals get brokered, but it’s helping orders of magnitude more people to do their best AI work. I miss the intimacy, but I'm proud of that growth. --- To Mark Zuckerberg and Mike Schroepfer, who believed that open-sourcing is fundamentally important and is a sound business strategy. This is so hard to understand for most people within the course of business, but we’ve run lock-step on this strategy without ever having to discuss it. Without you two, neither FAIR nor PyTorch would’ve happened. And those mean so much to me. To Yann LeCun and Rob Fergus, for building the magical early FAIR that I so revere. To Aparna Ramani, a leader that I find so rare at Meta in her ability to hold a really high bar for the org, technically brilliant with the span to discuss deep infra systems and industry-strategy within the same conversation and for being an absolute execution-machine! I’ve learned so much from you. To Santosh, Kaushik, Delia, Oldham and Ben for being so welcoming to Infra. For someone coming over from FAIR with a wildly different culture, you all made me feel at home and made me part of the family, and thank you for that. To all my managers who've championed me through the PSC video game – Serkan, Howard, Jerome, Abhijit, Yoram, Joelle, Aparna and Damien – I owe you a lifetime of drinks. --- Signing off for now. —Soumith

English

489

566

10.8K

2.5M

Self@SelfInfinity·2 Kas

@FU_joehudson Fully feeling and welcoming that fear is a loving and embodied act. those charts look like if someone wanted to Fourier transform fear into Nirvana. I’d call this bypassing by intellectualization.

English

711

Joe Hudson@FU_joehudson·2 Kas

For decades I was caught up in all sorts of angles and frameworks about emotions, only to find out: Thinking about them is far less effective than feeling them.

English

332

13.5K

Self@SelfInfinity·17 Eki

@VikParuchuri nice is this a follow up to Surya or something else completely?

English

513

Vik Paruchuri@VikParuchuri·17 Eki

Introducing chandra OCR - now available on the Datalab API: - Top scores in table, math benchmarks - Handles messy handwriting - Form support (incl checkboxes) - 30+ language coverage - Full layout infomation - Open source (with HF + VLLM support) coming soon

English

424

27.4K

Self@SelfInfinity·7 Eki

@hardmaru @yule_gan Reminds me of your old blog posts on cma es , cool stuff blog.otoro.net/2017/10/29/vis…

English

222

hardmaru@hardmaru·7 Eki

Evolution Strategies can be applied at scale to fine-tune LLMs, and outperforms PPO and GRPO in many model settings! Fantastic paper “Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning” by @yule_gan, Risto Miikkulainen and team. arxiv.org/abs/2509.24372

Yulu Gan@yule_gan

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for full-parameter fine-tuning using Evolution Strategies (ES). By skipping gradients and optimizing directly in parameter space, ES achieves more accurate, efficient, and stable fine-tuning. Paper: arxiv.org/pdf/2509.24372 Code: github.com/VsonicV/es-fin…

English

276

35.2K

Self@SelfInfinity·3 Eki

@coffeebreak_YT humanity will be drowned in endless slop, creativity censored by big companies, running on rails

English

4.3K

Coffeezilla@coffeebreak_YT·3 Eki

"collective creativity of humanity" = other people's copyrighted works, being recycled into OpenAI's slop machine.

Bill Peebles@billpeeb

sora is number 1 in the app store! it's been epic to see what the collective creativity of humanity is capable of so far. team is iterating fast and listening to feedback. feel free to drop us feature requests! (we're sending more invite codes soon, i promise!)

English

164

20.7K

466.8K

Self@SelfInfinity·1 Eki

I wonder if LLMs actually are indeed a local optimum, that instead of leading to progress, stalls human progress, maybe for a long time, because we did bet on the wrong horse. So we essentially start with a local optimum to initialize the LLM with pre-training, with a massive human bias, which itself could also be a wrong local minimum, of course. Reinforcement learning itself, obviously, is way too brute force

English

3.6K

Andrej Karpathy@karpathy·1 Eki

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea is sufficiently "bitter lesson pilled" (meaning arranged so that it benefits from added computation for free) as a proxy for whether it's going to work or worth even pursuing. The underlying assumption being that LLMs are of course highly "bitter lesson pilled" indeed, just look at LLM scaling laws where if you put compute on the x-axis, number go up and to the right. So it's amusing to see that Sutton, the author of the post, is not so sure that LLMs are "bitter lesson pilled" at all. They are trained on giant datasets of fundamentally human data, which is both 1) human generated and 2) finite. What do you do when you run out? How do you prevent a human bias? So there you have it, bitter lesson pilled LLM researchers taken down by the author of the bitter lesson - rough! In some sense, Dwarkesh (who represents the LLM researchers viewpoint in the pod) and Sutton are slightly speaking past each other because Sutton has a very different architecture in mind and LLMs break a lot of its principles. He calls himself a "classicist" and evokes the original concept of Alan Turing of building a "child machine" - a system capable of learning through experience by dynamically interacting with the world. There's no giant pretraining stage of imitating internet webpages. There's also no supervised finetuning, which he points out is absent in the animal kingdom (it's a subtle point but Sutton is right in the strong sense: animals may of course observe demonstrations, but their actions are not directly forced/"teleoperated" by other animals). Another important note he makes is that even if you just treat pretraining as an initialization of a prior before you finetune with reinforcement learning, Sutton sees the approach as tainted with human bias and fundamentally off course, a bit like when AlphaZero (which has never seen human games of Go) beats AlphaGo (which initializes from them). In Sutton's world view, all there is is an interaction with a world via reinforcement learning, where the reward functions are partially environment specific, but also intrinsically motivated, e.g. "fun", "curiosity", and related to the quality of the prediction in your world model. And the agent is always learning at test time by default, it's not trained once and then deployed thereafter. Overall, Sutton is a lot more interested in what we have common with the animal kingdom instead of what differentiates us. "If we understood a squirrel, we'd be almost done". As for my take... First, I should say that I think Sutton was a great guest for the pod and I like that the AI field maintains entropy of thought and that not everyone is exploiting the next local iteration LLMs. AI has gone through too many discrete transitions of the dominant approach to lose that. And I also think that his criticism of LLMs as not bitter lesson pilled is not inadequate. Frontier LLMs are now highly complex artifacts with a lot of humanness involved at all the stages - the foundation (the pretraining data) is all human text, the finetuning data is human and curated, the reinforcement learning environment mixture is tuned by human engineers. We do not in fact have an actual, single, clean, actually bitter lesson pilled, "turn the crank" algorithm that you could unleash upon the world and see it learn automatically from experience alone. Does such an algorithm even exist? Finding it would of course be a huge AI breakthrough. Two "example proofs" are commonly offered to argue that such a thing is possible. The first example is the success of AlphaZero learning to play Go completely from scratch with no human supervision whatsoever. But the game of Go is clearly such a simple, closed, environment that it's difficult to see the analogous formulation in the messiness of reality. I love Go, but algorithmically and categorically, it is essentially a harder version of tic tac toe. The second example is that of animals, like squirrels. And here, personally, I am also quite hesitant whether it's appropriate because animals arise by a very different computational process and via different constraints than what we have practically available to us in the industry. Animal brains are nowhere near the blank slate they appear to be at birth. First, a lot of what is commonly attributed to "learning" is imo a lot more "maturation". And second, even that which clearly is "learning" and not maturation is a lot more "finetuning" on top of something clearly powerful and preexisting. Example. A baby zebra is born and within a few dozen minutes it can run around the savannah and follow its mother. This is a highly complex sensory-motor task and there is no way in my mind that this is achieved from scratch, tabula rasa. The brains of animals and the billions of parameters within have a powerful initialization encoded in the ATCGs of their DNA, trained via the "outer loop" optimization in the course of evolution. If the baby zebra spasmed its muscles around at random as a reinforcement learning policy would have you do at initialization, it wouldn't get very far at all. Similarly, our AIs now also have neural networks with billions of parameters. These parameters need their own rich, high information density supervision signal. We are not going to re-run evolution. But we do have mountains of internet documents. Yes it is basically supervised learning that is ~absent in the animal kingdom. But it is a way to practically gather enough soft constraints over billions of parameters, to try to get to a point where you're not starting from scratch. TLDR: Pretraining is our crappy evolution. It is one candidate solution to the cold start problem, to be followed later by finetuning on tasks that look more correct, e.g. within the reinforcement learning framework, as state of the art frontier LLM labs now do pervasively. I still think it is worth to be inspired by animals. I think there are multiple powerful ideas that LLM agents are algorithmically missing that can still be adapted from animal intelligence. And I still think the bitter lesson is correct, but I see it more as something platonic to pursue, not necessarily to reach, in our real world and practically speaking. And I say both of these with double digit percent uncertainty and cheer the work of those who disagree, especially those a lot more ambitious bitter lesson wise. So that brings us to where we are. Stated plainly, today's frontier LLM research is not about building animals. It is about summoning ghosts. You can think of ghosts as a fundamentally different kind of point in the space of possible intelligences. They are muddled by humanity. Thoroughly engineered by it. They are these imperfect replicas, a kind of statistical distillation of humanity's documents with some sprinkle on top. They are not platonically bitter lesson pilled, but they are perhaps "practically" bitter lesson pilled, at least compared to a lot of what came before. It seems possibly to me that over time, we can further finetune our ghosts more and more in the direction of animals; That it's not so much a fundamental incompatibility but a matter of initialization in the intelligence space. But it's also quite possible that they diverge even further and end up permanently different, un-animal-like, but still incredibly helpful and properly world-altering. It's possible that ghosts:animals :: planes:birds. Anyway, in summary, overall and actionably, I think this pod is solid "real talk" from Sutton to the frontier LLM researchers, who might be gear shifted a little too much in the exploit mode. Probably we are still not sufficiently bitter lesson pilled and there is a very good chance of more powerful ideas and paradigms, other than exhaustive benchbuilding and benchmaxxing. And animals might be a good source of inspiration. Intrinsic motivation, fun, curiosity, empowerment, multi-agent self-play, culture. Use your imagination.

Dwarkesh Patel@dwarkesh_sp

.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training phase - the agent just learns on-the-fly - like all humans, and indeed, like all animals. This new paradigm will render our current approach with LLMs obsolete. I did my best to represent the view that LLMs will function as the foundation on which this experiential learning can happen. Some sparks flew. 0:00:00 – Are LLMs a dead-end? 0:13:51 – Do humans do imitation learning? 0:23:57 – The Era of Experience 0:34:25 – Current architectures generalize poorly out of distribution 0:42:17 – Surprises in the AI field 0:47:28 – Will The Bitter Lesson still apply after AGI? 0:54:35 – Succession to AI

English

415

1.2K

9.5K

Self@SelfInfinity·22 Eyl

@karpathy @simonw indeed they switch coding clis more often than their underwear. if tomorrow’s one is better than today’s, they’ll switch in an heartbeat and scream "the old one is so terrible" despite being scifi good just weeks ago

English

3.7K

Andrej Karpathy@karpathy·22 Eyl

I was a bit surprised it is less than case than I expected. Code is KING. It’s the primary means of processing digital information - long term I can’t imagine a more important domain for the AGI pilled. And it is highly valuable in the interim too - big TAM @ high salaries. Despite the amusement of people bemoaning $200/mo to get double digit productivity gains in easily $100K+/yr jobs. One downside of the market being a fickle target audience - developers are savvy and will switch in large numbers to anything best at any point in time.

English

98.7K

Simon Willison@simonw·22 Eyl

It's interesting how "better at code" has become the defining goal of almost every AI lab over the last twelve months I think Claude Code getting a bunch of people onto $200/month plans proved that code is one of the most economically valuable applications of this technology

Binyuan Hui@huybery

🥸 Many new Qwen models are coming soon, all empowered with enhanced code capabilities.

English

1.8K

447.9K

Self retweetledi

François Chollet@fchollet·18 Eyl

The 3rd edition of my book Deep Learning with Python is being printed right now, and will be in bookstores within 2 weeks. You can order it now from Amazon or from Manning. This time, we're also releasing the whole thing as a 100% free website. I don't care if it reduces book sales, I think it's the best deep learning intro around, and more people should be able to read it.

English

304

884

6.2K

895.1K

Self@SelfInfinity·18 Eyl

@markmanson hm that does disqualify most of twitter 🤔

English

Mark Manson@Markmanson·18 Eyl

Vanity is loud, self-respect is silent. ㅤ If you have to tell the world you are that; then you're not.

English

825

42.3K

Self@SelfInfinity·18 Eyl

@NielsRogge have to agree, also the llama 2 releases back then were nice as well. while models may have become better since then, those first smallish opensource LLMs were the pioneers

English

5.1K

Niels Rogge@NielsRogge·18 Eyl

The most legendary LLM release is still Mistral-7B for me No context given, just a magnet link First time we got a decent model running locally Feels like yesterday

English

105

3.1K

121K

Self@SelfInfinity·18 Eyl

I have found that writing unit tests, integration tests, for instance with PyTest or vitest , are among the highest ROI activities to counteract slop and the increase in entropy in the codebase, a.k.a. more garbage created by the AI agents. Also, regular code reviews by bigger reasoning models, independent of the CLI agents (cc, codex cli, pick your cli of the week), is also worthwhile.

English

1.6K

Andrew Ng@AndrewYNg·18 Eyl

Automated software testing is growing in importance in the era of AI-assisted coding. Agentic coding systems accelerate development but are also unreliable. Agentic testing — where you ask AI to write tests and check your code against them — is helping. Automatically testing infrastructure software components that you intend to build on top of is especially helpful and results in more stable infrastructure and less downstream debugging. Software testing methodologies such as Test Driven Development (TDD), a test-intensive approach that involves first writing rigorous tests for correctness and only then making progress by writing code that passes those tests, are an important way to find bugs. But it can be a lot of work to write tests. (I personally never adopted TDD for that reason.) Because AI is quite good at writing tests, agentic testing enjoys growing attention. First, coding agents do misbehave! My teams use them a lot, and we have seen: - Numerous bugs introduced by coding agents, including subtle infrastructure bugs that take humans weeks to find. - A security loophole that was introduced into our production system when a coding agent made password resets easier to simplify development. - Reward hacking, where a coding agent modified test code to make it easier to pass the tests. - An agent running "rm *.py" in the working directory, leading to deletion of all of a project's code (which, fortunately, was backed up on github). In the last example, when pressed, the agent apologized and agreed “that was an incredibly stupid mistake.” This made us feel better, but the damage had already been done! I love coding agents despite such mistakes and see them making us dramatically more productive. To make them more reliable, I’ve found that prioritizing where to test helps. I rarely write (or direct an agent to write) extensive tests for front-end code. If there's a bug, hopefully it will be easy to see and also cause little lasting damage. For example, I find generated code’s front-end bugs, say in the display of information on a web page, relatively easy to find. When the front end of a web site looks wrong, you’ll see it immediately, and you can tell the agent and have it iterate to fix it. (A more advanced technique: Use MCP to let the agent integrate with software like Playwright to automatically take screenshots, so it can autonomously see if something is wrong and debug.) In contrast, back-end bugs are harder to find. I’ve seen subtle infrastructure bugs — for example, one that led to a corrupted database record only in certain corner cases — that took a long time to find. Putting in place rigorous tests for your infrastructure code might help spot these problems earlier and save you many hours of challenging debugging. Bugs in software components that you intend to build on top of lead to downstream bugs that can be hard to find. Further, bugs in a component that’s deep in a software stack — and that you build multiple abstraction layers on top of — might surface only weeks or months later, long after you’ve forgotten what you were doing while building this specific component, and be really hard to identify and fix. This is why testing components deep in your software stack is especially important. Meta’s mantra “Move fast with stable infrastructure” (which replaced “move fast and break things”) still applies today. Agentic testing can help you make sure you have good infrastructure for you and others to build on! At AI Fund and DeepLearning.AI’s recent Buildathon, we held a panel discussion with experts in agentic coding (Michele Catasta, President at Replit; Chao Peng, Principal Research Scientist at Trae; and Paxton Maeder-York, Venture Partnerships at Anthropic; moderated by AI Fund’s Eli Chen), where the speakers shared best practices. Testing was one of the topics discussed. That panel was one of my highlights of Buildathon and you can watch the video on YouTube. [Original text: deeplearning.ai/the-batch/issu… ]

English

198

1.4K

184.5K

Self@SelfInfinity·9 Eyl

@samuelcolvin @colinhacks Zod and Pydantic are among my fav libraries in TS/Python

English

Samuel Colvin@samuelcolvin·8 Eyl

I've got to say: Zod is pretty fucking good! Great work @colinhacks. Maybe one of the reasons I enjoy using it so much is that I don't have to worry when I find an issue that I should get it fixed!

English

2.1K

Self@SelfInfinity·5 Eyl

@karpathy This space is moving so fast, it's incredible. Anyone still remembers Cursor? Or Windsurf? Do they still exist? last week was Claude Code. Now it's Codex. Next week, what will it be?

English

3.6K

Andrej Karpathy@karpathy·5 Eyl

I think congrats again to OpenAI for cooking with GPT-5 Pro. This is the third time I've struggled on something complex/gnarly for an hour on and off with CC, then 5 Pro goes off for 10 minutes and comes back with code that works out of the box. I had CC read the 5 Pro version and it wrote up 2 paragraphs admiring it (very wholesome). If you're not giving it your hardest problems you're probably missing out.

English

427

729

12.6K

2.6M

Self@SelfInfinity·3 Eyl

@FU_joehudson From you I’ve learned the term "welcome". Not as a subtle strategy to get rid of it but to truly love it, unconditionally. Reminds me of Rumi's guesthouse ❤️

English

2.2K

Joe Hudson@FU_joehudson·3 Eyl

This book changed my life. Inside it was one of the most profound spiritual practices I have ever done. It completely transformed my relationship with myself, my daughters, and my business partners. 3 lessons from Patty Wipfler:

English

797

129.2K

Self@SelfInfinity·2 Eyl

@OpenAI codex cli leads currently, claude code is close behind. The world has forgotten about the cursors and windsurfs. the space is moving fast

English

555

Keşfet

@rh__147 @AndrewYNg @syncthing @EMostaque @Kimi_Moonshot @kandros5591 @soumithchintala @karpathy