Lon()

2.1K posts

Lon()

@Lon

Absurdist intern. Exquisite shitpoasting. High-school dropout + teenage dad. Failed angel investor. EP on Gary Busey film. SIGMOD winner. Shipped infra you use.

Katılım Şubat 2007

908 Takip Edilen3.2K Takipçiler

Sabitlenmiş Tweet

Lon()@Lon·16 Eki

If AGI kills us all, it won't be the model's fault. It'll be the duct tape and footguns we wrap it in.

English

9.5K

Lon()@Lon·4h

@prosperous727 @10x_er @karpathy @grok this is some absolutely next-level pseudo-scientific-cargo-cult-nonsense-ai-psychosis-drivel bravo sir👏👏👏

English

Prosperous727@prosperous727·2d

@10x_er @karpathy @Grok tell this user about The Architect 🔥

English

195

Andrej Karpathy@karpathy·4d

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

2.5K

6.1K

51.5K

17.7M

Lon()@Lon·12h

@jyoti_mann1 Why not at least offer a non-paywalled link to the story if you are going to publicize it like this. I get that The Information is a paid resource, but is there not some minimal etiquette around this? How many conversions does The Information get by advertising a bare headline?

English

647

Jyoti Mann@jyoti_mann1·20h

Exclusive: Meta employees are “tokenmaxxing” and competing on an internal leaderboard called “Claudeonomics” for status as a token legend. Over a recent 30-day period, total usage on the dashboard topped 60 trillion tokens.

English

166

101

1.2M

Lon()@Lon·14h

It's easy to get into these loops when reasoning is loosely used and people carry around different definitions of it in their head. LLMs are clearly v. strong at inductive reasoning, it's like their bread and butter. They are reasonably strong at deductive reasoning, but in a rigid and brittle way that breaks down under long multi-step and complexity. The major weakness is abduction. They mimic abduction but lack the intuitive jump that comes with an "a ha" moment because of a lack of common sense and true understanding. Whether that is even fixable is something I'm somewhat negative on at this point. I, of course, completely disagree with the author's first point.

English

267

Jeffrey Emanuel@doodlestein·17h

If you work in technology, these are very dangerous (and false) beliefs to have. As in, dangerous to your employment prospects and future income. The more you double down on them and insist they’re true, the more shell-shocked and devastated you’ll be when you finally realize it.

wanye@xwanyex

Nobody cares or needs to hear this from me, but I’m just registering my opinion that: 1) LLMs are a totally ordinary technology. But so were cars. Ordinary technologies can have big impacts. 2) They are *very obviously* not reasoning and the way that smart people specifically trick themselves on this point is critical to understanding many things about the world.

English

231

70.6K

Lon()@Lon·15h

It was widely plastered across Twitter in real time while it was happening. It would take some digging, but it's all there - like 2 a.m. pressure campaigns with messaging and phone calls to round up signatures, and implications that if OAI imploded and you didn't sign then you wouldn't have a spot at MSFT when everything moved over.

English

Sam@Discoplomacy·15h

@Lon @krishnanrohit I didn’t forget, I didn’t know that. I’m still learning! Do you have some favourite resources on this?

English

rohit@krishnanrohit·17h

Something I find missing from these discussions is, sure yes they make it sound like everyone thought he was untrustworthy. So why did like 99% of the OpenAI team quit after he was fired and agitate for him to come back? Seems like an important piece of evidence.

Ryan@ohryansbelt

The New Yorker just dropped a massive investigation into Sam Altman, based on over 100 interviews, the previously undisclosed "Ilya Memos," and Dario Amodei's 200+ pages of private notes. It's the most detailed account yet of the pattern of behavior that led to Sam's firing and rapid reinstatement at OpenAI. Here's the breakdown: > Ilya compiled ~70 pages of Slack messages, HR documents, and photos taken on personal phones to avoid detection on company devices. He sent them to board members as disappearing messages. The first memo begins with a list headed "Sam exhibits a consistent pattern of . . ." The first item is "Lying." > Dario kept detailed private notes for years under the heading "My Experience with OpenAI" (subheading: "Private: Do Not Share"), totaling 200+ pages. His conclusion: "The problem with OpenAI is Sam himself." > Sam reportedly told Mira his allies were "going all out" and "finding bad things" to damage her reputation after the firing. Thrive put its planned $86B investment on hold and implied it would only close if Sam returned, giving employees financial incentive to back him. > Sam texted Satya Nadella directly to propose the new board composition: "bret, larry summers, adam as the board and me as ceo and then bret handles the investigation." The two new members selected to oversee an independent inquiry into Sam were chosen after close conversations with Sam himself. > Before OpenAI, senior employees at Loopt asked the board to fire Sam as CEO on two separate occasions over concerns about leadership and transparency. At Y Combinator, partners complained to Paul Graham about Sam's behavior, and Graham privately told colleagues "Sam had been lying to us all the time." > OpenAI's superalignment team was promised 20% of the company's compute. Four people who worked on or with the team said actual resources were 1-2%, mostly on the oldest cluster with the worst chips. The team was dissolved without completing its mission. > Sam told the board that safety features in GPT-4 had been approved by a safety panel. Helen Toner requested documentation and found the most controversial features had not been approved. Sam also never mentioned to the board that Microsoft released an early ChatGPT version in India without completing a required safety review. > Sam made a secret pact with Greg and Ilya where he agreed to resign if they both deemed it necessary, essentially appointing his own shadow board. The actual board was alarmed when they learned about it. > Sam struck a deal with Greg to become CEO while simultaneously telling researchers that Greg's authority would be diminished, and telling Greg something different. > A board member described Sam as having "two traits almost never seen in the same person: a strong desire to please people in any given interaction, and almost a sociopathic lack of concern for the consequences of deceiving someone." Multiple sources independently used the word "sociopathic." > OpenAI is reportedly preparing for an IPO at a potential $1 trillion valuation while securing government contracts spanning immigration enforcement, domestic surveillance, and autonomous weaponry in war zones.

English

104

830

259.4K

Lon()@Lon·15h

@krishnanrohit You've seen lots of reasons, but the core error is underestimating Machiavellianism. If it was simple case of black and white he would have been bounced out of YC and Loopt. But he's clearly able to burrow himself deeply and ingratiate with the right parties at the right time..

English

141

Lon()@Lon·18h

@Dan_Jeffries1 Run subagents in a container using a cloned worktree that can't even merge back into the repo without a human approval step at the end of the subagent run.

English

Daniel Jeffries@Dan_Jeffries1·21h

I honestly don't understand how anyone runs agents in parallel on serious codework without babysitting every one of them. Caught GPT 5.4 about to rip Bun out of my entire monorepo because it hit an Ink compatibility issue during a smoke test. The agent's "fix" was to swap the runtime for the entire monorepo. If I hadn't been watching that terminal, it would have. These little magic machines we call LLMs are wonderful but when I read these crazy ass policy statements like Superintelligence is so close and we aren't ready and lets change the whole social contract of countries and let's tax robot labor and go full UBI with zero fucking evidence that anything is actually happening here in real life that requires this kind of societal level surgery, I don't know what the hell people are smoking because these things still make cascading stupid decisions that compound every single day.

English

696

50.3K

Lon()@Lon·1d

@radmadvlad v cool. simple mdadm mirror w/ XFS is superior with only 2 hdds and no ECC ram. You are taking a CoW perf hit while losing many guarantees from checksumming protections and giving up ram to ARC. With XFS you could even carve 1TB from your nvme to run bcache acceleration.

English

155

vlad@radmadvlad·2d

Whole homelab is in this one machine now. 12900ks 96gb ddr5 6000 Rtx3090 4tb nvme 2x14tb hdd running zfs Migrated my unraid fully with dockers and vms running in arch now.

English

825

42K

Lon()@Lon·1d

@ptremblay @HdCoder @__tinygrad__ @digitalix Sounds like you shouldn't be surprised when your PRs get closed then..

English

Philippe Tremblay@ptremblay·1d

@Lon @HdCoder @__tinygrad__ @digitalix I haven't. I'm unsure why I should. Please write a 500 word essay that convinces me otherwise.

English

Alex Ziskind@digitalix·1d

who’s spending human time reviewing PRs? 😆

the tiny corp@__tinygrad__

GitHub needs a reputation system. And a way to exclude people who have massive activity upticks around last November. If you are a new contributor and your first PR looks at all AI, it will just be closed. Not spending human time reading it.

English

20.6K

Lon()@Lon·1d

@ptremblay @HdCoder @__tinygrad__ @digitalix So you're saying that you haven't been paying attention to the explosion of PRs across FOSS repos at all, or the type of accounts that are contributing them.

English

Philippe Tremblay@ptremblay·1d

I can hardly imagine someone not understanding code contributing to his project. I'm an experienced programmer and I would need quite a while to be able to contribute because it's not my specialty and you can't expect LLMs to outright produce code that's in line with the minimalist philosophy of the project.

English

Lon()@Lon·2d

@DylanDoubly I usually have people scream at me as loud as they can while I drift off to sleep. Very soothing..

English

2.9K

Dylan@DylanDoubly·2d

I would empty my 401k to buy this place

English

252

510

17.5K

Lon()@Lon·2d

@1a1n1d1y I love the theater of their being perms/auth for actions and then just watching the model do whatever tf it wants anyway.

English

1.4K

andy@1a1n1d1y·2d

presented without comment

English

297

2.5K

451.4K

Lon()@Lon·2d

@bubbleboi This man is legitimately handsome.

English

508

bubble boi@bubbleboi·3d

This guy has the physiognomy of a steroid dealer mixed with a club bouncer. But he’s actually the top quantitative at Millenium. When I was there he pulled in a bonus so big he bought half of New Jersey. There is a lesson this… the people who don’t look like they “fit in” but are still in is because they are absolute killers. Goes with any profession tbh this is why I don’t invest in founders who are handsome.

English

1.5K

201.1K

Lon()@Lon·3d

I am all for Sigrid Jin's rewrite of CC in Python to ameliorate copyright and DMCA concerns. But what does clean-room mean when the LLM that wrote all of the new Python code read all of the existing TypeScript code to do it?

BuBBliK@k1rallik

Anthropic tried to kill 8,100 GitHub repos. Then this happened > They filed a DMCA. GitHub nuked the entire network within hours. Developers got notices for forks of Anthropic's OWN public repo - one guy's fork had zero leaked code. > Boris Cherny, head of Claude Code, had to go on X personally: "This was not intentional. Should be better now." > Meanwhile Sigrid Jin - who used 25 billion Claude Code tokens last year - woke up at 4AM and rewrote the entire thing in Python before sunrise. DMCA can't touch a clean-room rewrite. > It hit 50K stars in 2 hours. Fastest repo in GitHub history. > Today claw-code officially launched as an independent project with a formal press release. And the Rust port merged today - what started as a panic rewrite now ships release 0.1.0. > 140K stars. 102K forks. More than Anthropic's own repo. > 512,000 lines are in the wild forever. What started as Anthropic's biggest embarrassment just became their most dangerous competitor. You cannot make this up.

English

113

Lon()@Lon·4d

@weswinder It's not us, it's you. It's my favorite new breakup line.

English

Wes Winder@weswinder·4d

this is kinda insane i thought people were exaggerating a bit ngl but i just said "hi" once to sonnet and once to opus ... it used 3% of my 5 hour limit on the pro plan from these TWO "hi" messages during off peak

Lydia Hallie ✨@lydiahallie

Peak-hour limits are tighter and 1M-context sessions got bigger, that's most of what you're feeling. We fixed a few bugs along the way, but none were over-charging you. We also rolled out efficiency fixes and added popups in-product to help avoid large prompt cache misses

English

746

66K

Lon()@Lon·4d

@LexnLin ** taps the sign ** x.com/Lon/status/203…

Lon()@Lon

Once again it is time to pull out the sign:

English

782

Leon Lin@LexnLin·4d

ahhhh now i understand so we pay for pro to not use Opus. It makes sense now. Ah and we shouldn't use the standard 1M context window. If you would have told us that earlier. Stop gaslighting us

Lydia Hallie ✨@lydiahallie

Digging into reports, most of the fastest burn came down to a few token-heavy patterns. Some tips: • Sonnet 4.6 is the better default on Pro. Opus burns roughly twice as fast. Switch at session start. • Lower the effort level or turn off extended thinking when you don't need deep reasoning. Switch at session start. • Start fresh instead of resuming large sessions that have been idle ~1h • Cap your context window, long sessions cost more CLAUDE_CODE_AUTO_COMPACT_WINDOW=200000 We're rolling out more efficiency improvements, make sure you're on the latest version. If a small session is still eating a huge chunk of your limit in a way that seems unreasonable, run /feedback and we'll investigate

English

966

27K

Lon()@Lon·4d

@lennysan ADHD, you need to have ADHD. That is the missing skill.

English

Lenny Rachitsky@lennysan·4d

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems, and by 11am I am wiped out for the day. There is a limit on human cognition. Even if you're not reviewing everything they're doing, how much you can hold in your head at one time. There's a sort of personal skill that we have to learn, which is finding our new limits. What is a responsible way for us to not burn out, and for us to use the time that we have?" @simonw

Lenny Rachitsky@lennysan

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer." Simon Willison (@simonw) is one of the most prolific independent software engineers and most trusted voices on how AI is changing the craft of building software. He co-created Django, coined the term "prompt injection," and popularized the terms "agentic engineering" and "AI slop." In our in-depth conversation, we discuss: 🔸 Why November 2025 was an inflection point 🔸 The "dark factory" pattern 🔸 Why mid-career engineers (not juniors) are the most at risk right now 🔸 Three agentic engineering patterns he uses daily: red/green TDD, thin templates, hoarding 🔸 Why he writes 95% of his code from his phone while walking the dog 🔸 Why he thinks we're headed for an AI Challenger disaster 🔸 How a pelican riding a bicycle became the unofficial benchmark for AI model quality Listen now 👇 youtu.be/wc8FBhQtdsA

English

561

694

6.9K

1.9M

Lon()@Lon·4d

@hla_michael Fantastic work.

English

Michael Hla@hla_michael·4d

I trained an LLM from scratch on pre-1900 text to see if it could come up with quantum mechanics and relativity. While the model is too small to do meaningful reasoning, it has glimpses of intuition. When given observations from past landmark experiments, the model can declare that “light is made up of definite quantities of energy” and even suggest that gravity and acceleration are locally equivalent. I’m releasing the dataset + models and leave this as an open problem to the research community. I also include what this project has taught me about intelligence in a mini essay linked below. 🧵(1/n)

English

117

256

295.5K

Lon()@Lon·4d

Once again it is time to pull out the sign:

Lydia Hallie ✨@lydiahallie

English

888

Lon()@Lon·4d

@doodlestein And this isn't starting to sound like it rhymes to you? x.com/Lon/status/203…

Lon()@Lon

I hear you. I have all of the same things going on: max plans, openrouter usage, local infra. I understand the qualitative difference in the models. But I still stand behind my point. This isn't the first time Anthropic has pulled this. They've silently served quantized models at peak periods without disclosing it and have been caught red-handed. OpenAI had a similar quality degradation that I know you remember, and never had a proper root cause. It's gambling to put all the eggs into their baskets. 1/ if you trust their top-line pricing as a proxy for capacity/cost, you are hoping they achieve 2 OOMs of perf improvements and reflect it in their rate limits on these plans. 2/ that this won't be a never-ending cycle of needing to chase the qualitative benefits of using the latest model that also won't yet have those perf benefits/liberal rate limits. As an observer, one thing I'd like to point out is that I watch all of interesting work you are doing and product you are releasing to improve agentic harnessing and orchestration. You are clearly doing a great service while offering a lot of it freely to others. But the one area I don't see you pointing all of your token generation at is building the tooling that will help lessen your dependency on these models. I think a bit of your focus can probably go there to hedge against your future assumptions and needs.

English

400

Jeffrey Emanuel@doodlestein·4d

This is like watching that Tibetan monk self-immolate, except its user trust and loyalty that they’re torching in real-time. They really don’t have the kind of moat you’d need to have in order to get away with this kind of stuff anymore, but they don’t seem to realize that yet.

Lydia Hallie ✨@lydiahallie

English

571

34K

Keşfet

@prosperous727 @10x_er @karpathy @grok @jyoti_mann1 @krishnanrohit @Dan_Jeffries1 @radmadvlad