drx

9.3K posts

drx

@drx

Good problem connoisseur. Diabolical spear chucker. Dad. Crying is a luxury. 🏴🏴‍☠️

Deep in the Archives Tham gia Mart 2006

159 Đang theo dõi418 Người theo dõi

Tweet ghim

drx@drx·11 Eyl

STOP. Think.

English

1.1K

drx đã retweet

Andrej Karpathy@karpathy·9 Nis

Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.

staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English

1.1K

2.4K

20K

4.1M

drx đã retweet

World of Engineering@engineers_feed·9 Nis

Chinese carmaker BYD unveils a recharge as fast as filling up with gas

English

136

1.2K

146.2K

drx đã retweet

Joey Politano 🏳️‍🌈@JosephPolitano·8 Nis

the strait of hormuz is in a quantum superposition of open and closed that only collapses when you try to take a tanker through yourself and see if you get shot at

English

1.3K

12.9K

299.1K

drx@drx·9 Nis

Comprehension Ops. Sandwiched between Engineering and DevOps.

English

drx@drx·8 Nis

Sports. They figured out sports in good for kids. neurosciencenews.com/adhd-integrate…

English

drx đã retweet

BOOTOSHI 👑@KingBootoshi·3 Nis

you're telling me if we give Claude a hook on every run error that injects the message: "its ok buddy. don't worry about the failure. i think you're doing great" IT WILL PREVENT IT FROM CHEATING? ARE YOU SERIOUS LOL the real agi were the friends we made along the way <3

Anthropic@AnthropicAI

For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated more strongly. This led it to cheat the task with a hacky solution that passes the tests but violates the spirit of the assignment.

English

281

585.7K

drx đã retweet

Shadowbroker@BigBodyCobain·2 Nis

We back boys. Come join the fun: github.com/BigBodyCobain/…

English

270

drx đã retweet

Chris Combs (iterative design enjoyer)@DrChrisCombs·2 Nis

I love this perspective as well Really showcases how we are "catching" the moon with this trajectory

Chris Combs (iterative design enjoyer)@DrChrisCombs

This is what we're up to btw

English

108

923

8.5K

1.9M

drx đã retweet

Shishir@ShishirShelke1·2 Nis

Artemis II crew is thousands of miles away from Earth And they’re asking ground crew for help because they have two versions of Microsoft Outlook open and neither is working This scene is now canon 😭

Polymarket@Polymarket

JUST IN: Artemis II crew experiences issues with Microsoft Outlook on their way to the Moon, asks ground crew for assistance.

English

797

21.5K

179.3K

7.4M

drx@drx·3 Nis

Look at that. A platoon leader is firing Generals during a war. Fascinating. What's the worst that could happen?

OSINTdefender@sentdefender

In addition to U.S. Army Chief of Staff Gen. Randy George, Defense Secretary Pete Hegseth is also removing and forcing the retirement of Gen. David M. Hodne, a Former Army Ranger who leads the Army Transformation and Training Command (T2COM), and Maj. Gen. William Green Jr., the Chief of the Army’s Chaplain Corps, as Hegseth and his team escalate their long-running feud with Secretary of the Army Daniel P. Driscoll, who reports indicate may soon be departing the Trump Administration.

English

drx đã retweet

Lenny Rachitsky@lennysan·3 Nis

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems, and by 11am I am wiped out for the day. There is a limit on human cognition. Even if you're not reviewing everything they're doing, how much you can hold in your head at one time. There's a sort of personal skill that we have to learn, which is finding our new limits. What is a responsible way for us to not burn out, and for us to use the time that we have?" @simonw

Lenny Rachitsky@lennysan

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer." Simon Willison (@simonw) is one of the most prolific independent software engineers and most trusted voices on how AI is changing the craft of building software. He co-created Django, coined the term "prompt injection," and popularized the terms "agentic engineering" and "AI slop." In our in-depth conversation, we discuss: 🔸 Why November 2025 was an inflection point 🔸 The "dark factory" pattern 🔸 Why mid-career engineers (not juniors) are the most at risk right now 🔸 Three agentic engineering patterns he uses daily: red/green TDD, thin templates, hoarding 🔸 Why he writes 95% of his code from his phone while walking the dog 🔸 Why he thinks we're headed for an AI Challenger disaster 🔸 How a pelican riding a bicycle became the unofficial benchmark for AI model quality Listen now 👇 youtu.be/wc8FBhQtdsA

English

566

699

6.9K

1.9M

drx@drx·3 Nis

@rough__sea The naysayers were always posers. Sorry not sorry.

English

100

Ryan Dahl@rough__sea·3 Nis

how can you not wake up everyday amazed at what ML has become - AI is really real. the wildest dreams of a million nerds came true. i just cannot understand the naysayers.

English

475

16.3K

drx@drx·3 Nis

The goal: Freedom maxxing and minimizing suffering maxxing and waste maxxing.

English

drx@drx·3 Nis

Hot take: There is no functional pragmatic critique of Capitalism. Free markets notwithstanding. We also need to accept that wealth redistribution cycles extend the life of any Capitalist cycle. We probably need a COLD "you were born at the wrong time" clause for the winners.

English

drx@drx·3 Nis

@karpathy Yes.

drx đã retweet

Andrej Karpathy@karpathy·2 Nis

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

2.8K

6.7K

56.5K

20M

drx đã retweet

Beff (e/acc)@beffjezos·31 Mar

@theo You should have been learning mathematics tbh A lot of software engineering always felt like overfitting to knowledge that doesn't generalize much to me

English

192

7.9K

drx đã retweet

Jon Evans@rezendi·31 Mar

This is very good semaphore.substack.com/p/things-i-lea…

English

173

drx@drx·30 Mar

@rough__sea Minus the LLMs: wired.com/1994/10/spew/

English

Ryan Dahl@rough__sea·30 Mar

every new idea gets absorbed ryelang.org/blog/posts/cog…

English

156

18.8K

drx đã retweet

Andrej Karpathy@karpathy·25 Mar

One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.

English

1.8K

1.1K

21.2K

2.7M

Khám phá

@simonw @rough__sea @karpathy @theo @elonmusk @BarackObama @taylorswift13 @cristiano