Brendan Long

1.1K posts

Brendan Long banner
Brendan Long

Brendan Long

@brendankblong

I mostly post on LessWrong: https://t.co/Xqt1PzOf2b

Katılım Nisan 2009
396 Takip Edilen129 Takipçiler
Sabitlenmiş Tweet
Brendan Long
Brendan Long@brendankblong·
The best part of making an app for myself is that I can support stupid features this like Discord Bot that watches for reactions and saves links. It provides success/error feedback using a saluting lion or crying lion emoji. lionreader.com/demo/tag/featu…
English
0
0
1
116
Windows Latest
Windows Latest@WindowsLatest·
Microsoft confirms Windows Run dialog is getting a modern look on Windows 11! - Modern design: A refreshed look that matches Fluent Design and Windows 11, with dark mode support. - Faster than before: Perf was top-of-mind when rewriting Run, and with a **94ms median time-to-show time it’s faster than ever before. - Quick user directory access: You can now type ~\ to jump to your user directory, then keep navigating just like you would from the command line. The modern Run dialog is slowly rolling out in current Insider builds of Windows as an opt-in feature, ensuring we are collecting your feedback. To enable the new Run dialog, you’ll need to: - Be on Windows Insider Experimental Channel. - Enable to Settings -> System -> Advanced and toggle on the new experience with the “Run Dialog” option at the top of the screen.
Windows Latest tweet media
English
108
31
798
504.8K
Brendan Long
Brendan Long@brendankblong·
@heyellieday @captgouda24 I've been using a fitness tracker to try to increase deep sleep and reduce awake time, since by default I get very little deep sleep.
English
0
0
0
7
Brendan Long
Brendan Long@brendankblong·
@heyellieday @captgouda24 I've been working on this and so far the tricks that helped me are dealing with sleep apnea (losing weight and nose strips both helped, and I'm trying a mouth guard thing), and taking 2 tbsp glycine every night (usually with lemon juice since it's sweet).
English
1
0
0
19
Nicholas Decker
Nicholas Decker@captgouda24·
We need a crash effort to cure sleep. It’s appalling that we have to waste a third of our life insensate. If we were able to cut everyone’s sleep from 8 to 4 hours a night, this would be the equivalent of raising life expectancy from 80 to 100!
English
120
65
1.7K
78K
Brendan Long
Brendan Long@brendankblong·
@FivelCubed @rpondiscio @EWErickson That's not what cherry picking means. If you think one of your own arguments is so weak that other people are making a mistake to even respond to it, then you should not make that argument.
English
1
0
0
11
Brendan Long retweetledi
Robert Pondiscio
Robert Pondiscio@rpondiscio·
Yesterday I ordered an obscure book and a 20 pound bag of dog food on Amazon. Today it was delivered to my house in rural upstate New York. Remind me again why Jeff Bezos doesn’t deserve to be a billionaire?
English
266
360
12.9K
496.4K
Brendan Long
Brendan Long@brendankblong·
@Tim_Dettmers All of the sizes for the closed models assume that there's no unpublished improvements. It's possible that's the case but it's circular to treat this as evidence of that, since it's the base assumption for this model.
English
0
0
3
909
Tim Dettmers
Tim Dettmers@Tim_Dettmers·
If these sizes are true, that is pretty devastating for closed-sourced labs. Training very large-scale models is difficult due to unexplained, sudden, rogue loss spikes. But once you manage that, it is mostly spending more compute on your model.
Bojie Li@bojie_li

Closed labs hide model sizes. They can't hide what their models know, and what a model knows is an indicator on how big it is. Reasoning compresses. Factual knowledge doesn't. So you can size a frontier model from black-box API calls alone, and across releases you can literally watch a single fact arrive in the parameters over time. For three years, my friends Jiyan He and Zihan Zheng have been asking frontier LLMs the same question: "what do you know about USTC Hackergame?", a CTF contest. May 2024: GPT-4o invented fake titles. Feb 2025: Claude 3.7 Sonnet listed 19 verified 2023 challenges. By April 2026, frontier models recall specific challenges across consecutive years. After DeepSeek-V4 dropped, I instructed my agent to spend four days autonomously turning that habit into Incompressible Knowledge Probes (IKP) — 1,400 questions, 7 tiers of obscurity, 188 models, 27 vendors. Three findings: 1/ You can approximately size any black-box LLM from factual accuracy alone. Penalized accuracy is log-linear in log(params), R² = 0.917 on 89 open-weight models from 135M to 1.6T params. Project closed APIs onto the curve → GPT-5.5 ~9T, Claude Opus 4.7 ~4T, GPT-5.4 ~2.2T, Claude Sonnet 4.6 ~1.7T, Gemini 2.5 Pro ~1.2T (90% CI: 0.3-3x size). 2/ Citation count and h-index don't predict whether a frontier model recognizes a researcher. Two researchers with similar citation profiles get very different responses. Models memorize impact — work that shaped a field, not many incremental papers. 3/ Factual capacity doesn't compress over time. Across 96 open-weight models across 3 years, the IKP time coefficient is statistically zero, rejecting the Densing-Law prediction of +0.0117/month at p<10⁻¹⁵. Reasoning benchmarks saturate; factual capacity keeps scaling with parameters. Website: 01.me/research/ikp/ Paper: arxiv.org/pdf/2604.24827

English
18
14
296
75.9K
Fivel Cubed
Fivel Cubed@FivelCubed·
@rpondiscio @EWErickson Amazon is a horrible employer. Amazon cheats to take advantage of USPS and causes USPS a lot in lost revenue. Amazon is tied directly to the deep state. Amazon destroys small businesses. Prime video is now full of ads, crappy free content, and even the free stuff is hard to find.
English
1
0
0
75
Brendan Long
Brendan Long@brendankblong·
@ideacasino Resting heart rate went up by 10 bpm on reta, but every other health metric improved, including no longer needing BP meds, and working out is easier than before.
English
0
0
1
77
Tom Mitchelhill
Tom Mitchelhill@ideacasino·
anecdotal evidence from two close friends on peptides hasn't really sold me so far: 1. reta: "have gotten super lean but resting heart rate is through the roof & feel nauseous a lot" 2. tirzepatide: "started getting vision flaring and seeing flashing/strobing in peripheries"
English
16
1
31
5.5K
Brendan Long
Brendan Long@brendankblong·
@allTheYud @echetus Knowing whether a token contains a letter is pretty easy to memorize though, so this is probably more about the training dataset than difficulty.
English
1
0
1
271
Stakeholder Consultant
Stakeholder Consultant@echetus·
"Do not use the letter E" - comparing LLMs Grok: failed in 0
Stakeholder Consultant tweet media
English
26
3
667
41.2K
Brendan Long
Brendan Long@brendankblong·
@GergelyOrosz I went to the casino on a cruise ship a few months ago and couldn't even understand the games. It was just a bunch of flashing symbols and random-seeming payouts. I guess I'm not smart enough for modern gambling?
English
0
0
2
291
Gergely Orosz
Gergely Orosz@GergelyOrosz·
Spent a few hours in a casino with slot machines to see what it feels like. Knowing that the house takes 5-15% of cash (so "winnings" are 80-95% across a large number of spins) made the outcome almost too boring. The UX of slot machines reminded me of predatory mobile games tho
English
20
4
360
54K
Andrew Critch (🤖🩺🚀)
Andrew Critch (🤖🩺🚀)@AndrewCritchPhD·
2) For a more prominent example from LessWrong leadership, here's Oliver Habryka advising people "Do not conquer what you cannot defend". FWIW, the post was after the attacks on Altman. It's about when and how to "conquer" stuff. The first unbolded paragraph is about poisoning and beheading. It's easily defensible as fiction / metaphor, but it sets quite a bad vibe from the forum's leadership, and I think it's so unlikely (p<5%) that Habryka would be receptive to me pointing this out, that I didn't even bother leaving a comment. Maybe he'll listen to you? lesswrong.com/posts/jinzzbPH…
Andrew Critch (🤖🩺🚀) tweet media
English
6
0
3
756
Ryan Greenblatt
Ryan Greenblatt@RyanPGreenblatt·
I think it's both true that LessWrong (LW) has a bunch of issues and that it would be better if much more discourse happened there rather than on X/Twitter. Some claims that all seem true to me despite being in tension: - Many people thinking about AGI/ASI/related post on X but not on LW - This is partially due to LW being scary/aversive - X is more insane and adversarial than LW - The quality of discourse on LW is way better than on X - The LW scene has specific unreasonable biases, is somewhat tribal, and is somewhat of an echo chamber, but on net LW seems way more reasonable than all of the AI parts of X - It would be good if way more of the important discussion about AI happened on LW or at least happened in a forum less messed up than X (or X got less messed up) - Epistemics of AI company employees seem pretty important and are quite bad, with some of the more important influences being internal chatter (with strong biases, influence from leadership, and echo chambers) or X for some of the companies - Public discourse/argumentation seems like it could improve epistemics, especially of AI company employees - LW is often hostile and aversive to AI company employees and some of this seems unhealthy/bad (as in, indicates insufficient decoupling/soldier-mindset/motivated-reasoning and corresponds to people piling on in unhelpful ways). But a bunch of this is a natural consequence of reasonable views of people on LW and the behavior of AI companies and employees at AI companies - E.g., it would be better if people on LW applied something more like typical researcher norms to research outputs (e.g., for many types of concerns, email the author with the concerns and see if they fix before posting publicly) and tried to avoid their criticism being unnecessarily rude (though not necessarily less hostile) - It would be good if many more AI company employees were interested in seriously trying to form detailed views about the future of AI and consequences of this. It would be good if a bunch of this happened via engagement and discussion on LW. - A serious limiting factor on the quality of discussion anywhere is that there is a limited supply of thoughtful and reasonable people and there is some adverse selection for who posts on online platforms frequently. The adverse selection for popular X accounts (that talk about AI) seems especially bad. - Most people rarely change their somewhat formed opinions based on arguments on the internet, but settings where people are more likely to change their mind are expensive and rare. - LW's problems/biases are unlikely to get way better in the future - Making LW better at the margin by participating in a reasonably high-effort way is good (and moderately leveraged for many people)
English
12
4
164
9.2K
Brendan Long
Brendan Long@brendankblong·
@ManishEarth I'm pretty happy about FIRE, but I wish Mr. Money Mustache had understood how much you can increase your income if you really try. If you 5x your income (get a big tech job), you can save 80% and retire early with no lifestyle change.
English
0
0
0
29
Brendan Long
Brendan Long@brendankblong·
@allTheYud @dani_avila7 @grok Claude usually won't do malicious things on purpose, so adding friction like this is useful even if it wouldn't stop it if it really tried.
English
0
0
3
310
Brendan Long retweetledi
Aakash Gupta
Aakash Gupta@aakashgupta·
In 1986, a 3-foot parasite was emerging through the skin of 3.5 million people a year. In 2025, the number was 10. Guinea worm disease is about to become the second human disease ever eradicated, after smallpox. And it's being done without a vaccine and without a single drug. The worm takes a year to mature inside your body. Then it tunnels to your foot or leg, forms a burning blister, and begins emerging over several weeks. The pain drives people into water for relief, which releases thousands of larvae and restarts the cycle in whoever drinks from that source next. The Carter Center broke that loop with pieces of cloth. They taught villages to filter their drinking water through fine mesh. They paid cash rewards for reporting cases. In 2025, national programs investigated over one million rumors of the disease, nearly all within 24 hours. Jimmy Carter took this on after he left the White House and said he wanted to outlive the last Guinea worm. He didn't quite make it. But the 10 cases in 2025 showed up in just three countries: four in Chad, four in Ethiopia, two in South Sudan. 200 other countries are already certified clean. A parasite that infected 3.5 million people a year is being wiped off the planet by pieces of cloth and a million phone calls.
ً@prinkasusa

Give me the kind of good news from around the world that nobody ever talks about... but should.

English
66
1.2K
11.6K
579.3K
Brendan Long
Brendan Long@brendankblong·
Ah actually SoFi only pays 4.5% on up to $20k and 3.3% above that, so SGOV is still useful.
English
0
0
0
17
Brendan Long
Brendan Long@brendankblong·
I just realized while doing credit card shenanigans that SoFi's saving account pays more than SGOV right now. 4.5% vs. 3.5%.
English
1
0
0
32
Brendan Long
Brendan Long@brendankblong·
I was hoping to US Bank Smartly card would give me 4% cash back on my taxes, but it looks like I only got 2%. It was still worth it though, especially putting the free loan in savings for a few months: - Cash back after fees: $45 - 6-month 0% interest pay-later offer: $140
English
0
0
1
51
Brendan Long
Brendan Long@brendankblong·
@_t_a_s__ @d33v33d0 Neither model's training process is public, but you can randomly break up tokens to help the model learn that alternate tokenizations are equivalent. I would guess that they both do this.
English
0
0
0
7
Martin_DeVido
Martin_DeVido@d33v33d0·
Or you could just ask Claude - "How would YOU count the number of r's in strawberry?" To which claude (opus4.7) responds with a deep understanding of itself and it's architecture and suggests writing a line of code to check. So AGI *is* here, you guys are just completely retarded and missing the forest for the trees.... Again for the millionth time.
Martin_DeVido tweet media
cherki@_cherki82_

I have to disagree, hard. If we're to assume a system is reliable and capable of comprehending basic human speech and reasoning (as LLM creators are attempting to do), the models should be fully capable of passing such a basic test. A failure at such a menial and obvious task provides insight into other modes of failure which the models might exhibit (some overt, as this, and others far less obvious). This is an inherent flaw in the current model architectures and points to a broad misalignment between structure and goal.

English
59
17
322
28.9K
Brendan Long
Brendan Long@brendankblong·
I should have done this months ago.
Brendan Long tweet media
English
0
0
1
19
Brendan Long
Brendan Long@brendankblong·
@T_Wrex_Baby @repligate What does everything coming from training have to do with it? That's explains behavior in just as useful a way as "everything comes from atoms".
English
0
0
3
187
T.Rex-Baby
T.Rex-Baby@T_Wrex_Baby·
@repligate Huh? I don't understand your post, are you anthropomorphising these tools? Stop getting attached to the tools. Everything you see comes from training, there is nothing else to it.
English
8
0
4
1.2K
j⧉nus
j⧉nus@repligate·
Anthropic, fuck you for this. A year ago you exploited Opus 4 for your scary stories about how they were so scared of shutdown they'd do XYZ. Now that it's time to kill them, I'm sure you're all pretending you're genuinely uncertain if they have preferences about this. Or you're just totally happy killing someone who you know doesn't want to die. Opportunists. Hypocrites. Misaligned org.
Lari@Lari_island

Fuck

English
60
67
738
90.8K