Lon()

2.1K posts

Lon() banner
Lon()

Lon()

@Lon

Absurdist intern. Exquisite shitpoasting. High-school dropout + teenage dad. Failed angel investor. EP on Gary Busey film. SIGMOD winner. Shipped infra you use.

Katılım Şubat 2007
908 Takip Edilen3.2K Takipçiler
Sabitlenmiş Tweet
Lon()
Lon()@Lon·
If AGI kills us all, it won't be the model's fault. It'll be the duct tape and footguns we wrap it in.
English
2
3
15
9.5K
Andrej Karpathy
Andrej Karpathy@karpathy·
LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.
English
2.5K
6.1K
51.5K
17.7M
Lon()
Lon()@Lon·
@jyoti_mann1 Why not at least offer a non-paywalled link to the story if you are going to publicize it like this. I get that The Information is a paid resource, but is there not some minimal etiquette around this? How many conversions does The Information get by advertising a bare headline?
English
0
0
2
647
Jyoti Mann
Jyoti Mann@jyoti_mann1·
Exclusive: Meta employees are “tokenmaxxing” and competing on an internal leaderboard called “Claudeonomics” for status as a token legend. Over a recent 30-day period, total usage on the dashboard topped 60 trillion tokens.
English
166
101
3K
1.2M
Lon()
Lon()@Lon·
It's easy to get into these loops when reasoning is loosely used and people carry around different definitions of it in their head. LLMs are clearly v. strong at inductive reasoning, it's like their bread and butter. They are reasonably strong at deductive reasoning, but in a rigid and brittle way that breaks down under long multi-step and complexity. The major weakness is abduction. They mimic abduction but lack the intuitive jump that comes with an "a ha" moment because of a lack of common sense and true understanding. Whether that is even fixable is something I'm somewhat negative on at this point. I, of course, completely disagree with the author's first point.
English
0
0
4
267
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
If you work in technology, these are very dangerous (and false) beliefs to have. As in, dangerous to your employment prospects and future income. The more you double down on them and insist they’re true, the more shell-shocked and devastated you’ll be when you finally realize it.
wanye@xwanyex

Nobody cares or needs to hear this from me, but I’m just registering my opinion that: 1) LLMs are a totally ordinary technology. But so were cars. Ordinary technologies can have big impacts. 2) They are *very obviously* not reasoning and the way that smart people specifically trick themselves on this point is critical to understanding many things about the world.

English
29
7
231
70.6K
Lon()
Lon()@Lon·
It was widely plastered across Twitter in real time while it was happening. It would take some digging, but it's all there - like 2 a.m. pressure campaigns with messaging and phone calls to round up signatures, and implications that if OAI imploded and you didn't sign then you wouldn't have a spot at MSFT when everything moved over.
English
1
0
0
18
Sam
Sam@Discoplomacy·
@Lon @krishnanrohit I didn’t forget, I didn’t know that. I’m still learning! Do you have some favourite resources on this?
English
1
0
1
19
rohit
rohit@krishnanrohit·
Something I find missing from these discussions is, sure yes they make it sound like everyone thought he was untrustworthy. So why did like 99% of the OpenAI team quit after he was fired and agitate for him to come back? Seems like an important piece of evidence.
Ryan@ohryansbelt

The New Yorker just dropped a massive investigation into Sam Altman, based on over 100 interviews, the previously undisclosed "Ilya Memos," and Dario Amodei's 200+ pages of private notes. It's the most detailed account yet of the pattern of behavior that led to Sam's firing and rapid reinstatement at OpenAI. Here's the breakdown: > Ilya compiled ~70 pages of Slack messages, HR documents, and photos taken on personal phones to avoid detection on company devices. He sent them to board members as disappearing messages. The first memo begins with a list headed "Sam exhibits a consistent pattern of . . ." The first item is "Lying." > Dario kept detailed private notes for years under the heading "My Experience with OpenAI" (subheading: "Private: Do Not Share"), totaling 200+ pages. His conclusion: "The problem with OpenAI is Sam himself." > Sam reportedly told Mira his allies were "going all out" and "finding bad things" to damage her reputation after the firing. Thrive put its planned $86B investment on hold and implied it would only close if Sam returned, giving employees financial incentive to back him. > Sam texted Satya Nadella directly to propose the new board composition: "bret, larry summers, adam as the board and me as ceo and then bret handles the investigation." The two new members selected to oversee an independent inquiry into Sam were chosen after close conversations with Sam himself. > Before OpenAI, senior employees at Loopt asked the board to fire Sam as CEO on two separate occasions over concerns about leadership and transparency. At Y Combinator, partners complained to Paul Graham about Sam's behavior, and Graham privately told colleagues "Sam had been lying to us all the time." > OpenAI's superalignment team was promised 20% of the company's compute. Four people who worked on or with the team said actual resources were 1-2%, mostly on the oldest cluster with the worst chips. The team was dissolved without completing its mission. > Sam told the board that safety features in GPT-4 had been approved by a safety panel. Helen Toner requested documentation and found the most controversial features had not been approved. Sam also never mentioned to the board that Microsoft released an early ChatGPT version in India without completing a required safety review. > Sam made a secret pact with Greg and Ilya where he agreed to resign if they both deemed it necessary, essentially appointing his own shadow board. The actual board was alarmed when they learned about it. > Sam struck a deal with Greg to become CEO while simultaneously telling researchers that Greg's authority would be diminished, and telling Greg something different. > A board member described Sam as having "two traits almost never seen in the same person: a strong desire to please people in any given interaction, and almost a sociopathic lack of concern for the consequences of deceiving someone." Multiple sources independently used the word "sociopathic." > OpenAI is reportedly preparing for an IPO at a potential $1 trillion valuation while securing government contracts spanning immigration enforcement, domestic surveillance, and autonomous weaponry in war zones.

English
104
30
830
259.4K
Lon()
Lon()@Lon·
@krishnanrohit You've seen lots of reasons, but the core error is underestimating Machiavellianism. If it was simple case of black and white he would have been bounced out of YC and Loopt. But he's clearly able to burrow himself deeply and ingratiate with the right parties at the right time..
English
0
0
0
141
Lon()
Lon()@Lon·
@Dan_Jeffries1 Run subagents in a container using a cloned worktree that can't even merge back into the repo without a human approval step at the end of the subagent run.
English
0
0
0
30
Daniel Jeffries
Daniel Jeffries@Dan_Jeffries1·
I honestly don't understand how anyone runs agents in parallel on serious codework without babysitting every one of them. Caught GPT 5.4 about to rip Bun out of my entire monorepo because it hit an Ink compatibility issue during a smoke test. The agent's "fix" was to swap the runtime for the entire monorepo. If I hadn't been watching that terminal, it would have. These little magic machines we call LLMs are wonderful but when I read these crazy ass policy statements like Superintelligence is so close and we aren't ready and lets change the whole social contract of countries and let's tax robot labor and go full UBI with zero fucking evidence that anything is actually happening here in real life that requires this kind of societal level surgery, I don't know what the hell people are smoking because these things still make cascading stupid decisions that compound every single day.
English
96
44
696
50.3K
Lon()
Lon()@Lon·
@radmadvlad v cool. simple mdadm mirror w/ XFS is superior with only 2 hdds and no ECC ram. You are taking a CoW perf hit while losing many guarantees from checksumming protections and giving up ram to ARC. With XFS you could even carve 1TB from your nvme to run bcache acceleration.
English
1
0
7
155
vlad
vlad@radmadvlad·
Whole homelab is in this one machine now. 12900ks 96gb ddr5 6000 Rtx3090 4tb nvme 2x14tb hdd running zfs Migrated my unraid fully with dockers and vms running in arch now.
vlad tweet media
English
56
21
825
42K
Philippe Tremblay
Philippe Tremblay@ptremblay·
I can hardly imagine someone not understanding code contributing to his project. I'm an experienced programmer and I would need quite a while to be able to contribute because it's not my specialty and you can't expect LLMs to outright produce code that's in line with the minimalist philosophy of the project.
English
2
0
0
72
Lon()
Lon()@Lon·
@DylanDoubly I usually have people scream at me as loud as they can while I drift off to sleep. Very soothing..
English
0
0
25
2.9K
Dylan
Dylan@DylanDoubly·
I would empty my 401k to buy this place
English
252
510
17.5K
5M
Lon()
Lon()@Lon·
@1a1n1d1y I love the theater of their being perms/auth for actions and then just watching the model do whatever tf it wants anyway.
English
1
0
12
1.4K
andy
andy@1a1n1d1y·
presented without comment
andy tweet media
English
297
99
2.5K
451.4K
bubble boi
bubble boi@bubbleboi·
This guy has the physiognomy of a steroid dealer mixed with a club bouncer. But he’s actually the top quantitative at Millenium. When I was there he pulled in a bonus so big he bought half of New Jersey. There is a lesson this… the people who don’t look like they “fit in” but are still in is because they are absolute killers. Goes with any profession tbh this is why I don’t invest in founders who are handsome.
bubble boi tweet media
English
57
40
1.5K
201.1K
Lon()
Lon()@Lon·
@weswinder It's not us, it's you. It's my favorite new breakup line.
English
0
0
0
72
Wes Winder
Wes Winder@weswinder·
this is kinda insane i thought people were exaggerating a bit ngl but i just said "hi" once to sonnet and once to opus ... it used 3% of my 5 hour limit on the pro plan from these TWO "hi" messages during off peak
Wes Winder tweet media
Lydia Hallie ✨@lydiahallie

Peak-hour limits are tighter and 1M-context sessions got bigger, that's most of what you're feeling. We fixed a few bugs along the way, but none were over-charging you. We also rolled out efficiency fixes and added popups in-product to help avoid large prompt cache misses

English
70
33
746
66K
Lon()
Lon()@Lon·
@lennysan ADHD, you need to have ADHD. That is the missing skill.
English
0
0
0
43
Lenny Rachitsky
Lenny Rachitsky@lennysan·
"Using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems, and by 11am I am wiped out for the day. There is a limit on human cognition. Even if you're not reviewing everything they're doing, how much you can hold in your head at one time. There's a sort of personal skill that we have to learn, which is finding our new limits. What is a responsible way for us to not burn out, and for us to use the time that we have?" @simonw
Lenny Rachitsky@lennysan

"Using coding agents well is taking every inch of my 25 years of experience as a software engineer." Simon Willison (@simonw) is one of the most prolific independent software engineers and most trusted voices on how AI is changing the craft of building software. He co-created Django, coined the term "prompt injection," and popularized the terms "agentic engineering" and "AI slop." In our in-depth conversation, we discuss: 🔸 Why November 2025 was an inflection point 🔸 The "dark factory" pattern 🔸 Why mid-career engineers (not juniors) are the most at risk right now 🔸 Three agentic engineering patterns he uses daily: red/green TDD, thin templates, hoarding 🔸 Why he writes 95% of his code from his phone while walking the dog 🔸 Why he thinks we're headed for an AI Challenger disaster 🔸 How a pelican riding a bicycle became the unofficial benchmark for AI model quality Listen now 👇 youtu.be/wc8FBhQtdsA

English
561
694
6.9K
1.9M
Michael Hla
Michael Hla@hla_michael·
I trained an LLM from scratch on pre-1900 text to see if it could come up with quantum mechanics and relativity. While the model is too small to do meaningful reasoning, it has glimpses of intuition. When given observations from past landmark experiments, the model can declare that “light is made up of definite quantities of energy” and even suggest that gravity and acceleration are locally equivalent. I’m releasing the dataset + models and leave this as an open problem to the research community. I also include what this project has taught me about intelligence in a mini essay linked below. 🧵(1/n)
English
117
256
2K
295.5K
Lon()
Lon()@Lon·
@doodlestein And this isn't starting to sound like it rhymes to you? x.com/Lon/status/203…
Lon()@Lon

I hear you. I have all of the same things going on: max plans, openrouter usage, local infra. I understand the qualitative difference in the models. But I still stand behind my point. This isn't the first time Anthropic has pulled this. They've silently served quantized models at peak periods without disclosing it and have been caught red-handed. OpenAI had a similar quality degradation that I know you remember, and never had a proper root cause. It's gambling to put all the eggs into their baskets. 1/ if you trust their top-line pricing as a proxy for capacity/cost, you are hoping they achieve 2 OOMs of perf improvements and reflect it in their rate limits on these plans. 2/ that this won't be a never-ending cycle of needing to chase the qualitative benefits of using the latest model that also won't yet have those perf benefits/liberal rate limits. As an observer, one thing I'd like to point out is that I watch all of interesting work you are doing and product you are releasing to improve agentic harnessing and orchestration. You are clearly doing a great service while offering a lot of it freely to others. But the one area I don't see you pointing all of your token generation at is building the tooling that will help lessen your dependency on these models. I think a bit of your focus can probably go there to hedge against your future assumptions and needs.

English
0
0
2
400
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
This is like watching that Tibetan monk self-immolate, except its user trust and loyalty that they’re torching in real-time. They really don’t have the kind of moat you’d need to have in order to get away with this kind of stuff anymore, but they don’t seem to realize that yet.
Lydia Hallie ✨@lydiahallie

Peak-hour limits are tighter and 1M-context sessions got bigger, that's most of what you're feeling. We fixed a few bugs along the way, but none were over-charging you. We also rolled out efficiency fixes and added popups in-product to help avoid large prompt cache misses

English
32
27
571
34K