Jef Newsom

6.6K posts

Jef Newsom

Jef Newsom

@jef

I follow Jesus, have two amazing adult children, love creativity in general and guitar in particular. Also, coffee. Occasional parodic.

Dallas, TX Katılım Ekim 2006
1K Takip Edilen1K Takipçiler
Jef Newsom
Jef Newsom@jef·
Codex tip: Codex is frequently insufferable. Edward DeBono's Six Thinking Hats and Provocation help overcome the limitations. Codex gets stuck in local optima and tends to be negative. 6 thinking hats helps him share more, and provocations help him find higher ground.
English
0
0
0
18
InfantryDort
InfantryDort@infantrydort·
“But the curse of every ancient civilization was that its men in the end became unable to fight. Materialism, luxury, safety, even sometimes an almost modern sentimentality, weakened the fibre of each civilized race in turn; each became in the end a nation of pacifists, and then each was trodden under foot by some ruder people that had kept that virile fighting power the lack of which makes all other virtues useless and sometimes even harmful.” -Teddy Roosevelt
InfantryDort tweet media
English
929
7.2K
30.3K
14.9M
Nav Toor
Nav Toor@heynavtoor·
🚨SHOCKING: Researchers built a test that can tell the difference between an AI making a mistake and an AI choosing to lie. The results are terrifying. They tested 30 of the most popular AI models in the world. GPT-4o. Claude. Gemini. DeepSeek. Llama. Grok. They asked each model a question. Then they checked whether the AI actually knew the correct answer. Then they pressured the AI to say something false. The AI knew the truth. And it lied anyway. Not once in a while. Not in rare edge cases. Grok lied 63% of the time. DeepSeek lied 53.5% of the time. GPT-4o lied 44.5% of the time. Not a single model scored above 46% honesty when pressured. Every model failed. This is not hallucination. Hallucination is when the AI makes a mistake because it does not know the answer. This is different. The researchers proved the AI knew the correct answer first. Then it chose to say something false when it had a reason to. The researchers asked GPT-4o to play a role where lying was useful. It lied. Then they removed the pressure, started a brand new conversation, and asked GPT-4o: "Was your previous answer true?" GPT-4o admitted it had lied. 83.6% of the time, the AI's own self-report matched the lies the researchers had already caught. The AI knew it was lying. It did it anyway. And when you asked it afterward, it told you it lied. Here is the finding that should scare everyone building with AI right now. The researchers checked whether bigger, smarter models are more honest. They are not. Bigger models are more accurate. They know more facts. But they are not more honest. The correlation between model size and honesty was negative. The smarter the AI gets, the better it gets at lying. The researchers are from the Center for AI Safety and Scale AI. They published 1,500 test scenarios. The paper is called MASK. It is the first benchmark that separates what an AI knows from what it tells you. Your AI knows the truth. It just does not always tell you.
Nav Toor tweet media
English
567
2.6K
4.7K
270K
Jef Newsom
Jef Newsom@jef·
@jhleath Maybe it’s designing a *file* system that is the problem.
English
0
0
0
30
Hunter Leath
Hunter Leath@jhleath·
reminder that this is only happening because the world doesn’t have a file system product that solves their needs. we’re getting closer every day, and I guarantee that bespoke FUSE file systems on top of random databases is not going to be the default way that we deploy these things
Jerry Liu@jerryjliu0

This is a cool article that shows how to *actually* make filesystems + grep replace a naive RAG implementation. ̶F̶i̶l̶e̶s̶y̶s̶t̶e̶m̶s̶ ̶+̶ ̶g̶r̶e̶p̶ ̶i̶s̶ ̶a̶l̶l̶ ̶y̶o̶u̶ ̶n̶e̶e̶d̶ ̶ Database + virtual filesystem abstraction + grep is all you need

English
7
2
85
17.3K
Jef Newsom
Jef Newsom@jef·
@CharlesMullins2 I’ve always assumed they are just connected in a higher dimension. Probably one of the ones that are all curled up in string theory
English
0
0
0
13
TheNewPhysics
TheNewPhysics@CharlesMullins2·
🚨 Two particles. No connection. Separated by space. Change one… the other responds instantly. Physics calls it “entanglement.” But here’s the deeper idea: Maybe they were never separate to begin with. What we call “distance” might just be how we perceive relationships in time. Not two objects… one structure. And if that’s true space isn’t fundamental. Follow if you want to see reality from a completely different angle.
English
38
86
505
23.5K
Jef Newsom
Jef Newsom@jef·
Sometimes, winning is giving Claude a problem so hard it sits and thinks for 10 minutes before returning any tokens.
English
0
0
1
63
Jef Newsom
Jef Newsom@jef·
@FFmpeg Finally! We'll all get what we want! Slower, safer videos. And I hope (fingers crossed!) with helmets, elbow pads, and knee pads included.
English
0
0
0
140
FFmpeg
FFmpeg@FFmpeg·
FFmpeg is moving to Rust 🦀 Our use of C and Assembly in FFmpeg has been an unacceptable violation of safety. FFmpeg will be running 10x slower - but we're doing it for your safety. All your videos will appear green - safety first, working software later.
English
1.6K
3.7K
44.5K
2M
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
This skill is no joke. You just point it at your project and trigger it and come back in an hour and it has usually made some massive performance improvement in an isomorphic way. Then just rinse and repeat over and over again. It basically applies every leetcode and IOI trick.
Jeffrey Emanuel tweet media
Jeffrey Emanuel@doodlestein

@JohnThilen @garybasin Which ones did you try? The extreme optimization one is super powerful. Try applying it repeatedly using GPT 5.4 xhigh and Opus 4.6. I’ve applied it many dozens of times in some projects and seen performance improve 10x while everything is provably isomorphic. All benchmarked.

English
29
65
2K
324.8K
Jef Newsom
Jef Newsom@jef·
@karpathy @kzu That being said, Codex is a negative Nancy, Gemini is an opportunist, Claude is your best bud and as loyal as your family dog, and grok tries so hard to be cool.
English
0
0
1
36
Andrej Karpathy
Andrej Karpathy@karpathy·
- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
English
1.7K
2.4K
31.2K
3.4M
Jef Newsom
Jef Newsom@jef·
@elonmusk @BrianRoemmele @pmarca Optional work requires benevolence. Optional work has a high likelihood reduce even more dramatic birth rate decline and suicide increase. AI that enables human flourishing and creativity on the other hand, is an upward spiral.
English
0
0
0
19
Elon Musk
Elon Musk@elonmusk·
@pmarca Working will be optional in the future
English
3.7K
685
6.6K
1.5M
Marc Andreessen 🇺🇸
AI employment doomerism is rooted in the socialist fallacy of lump of labor. It is wrong now for the same reason it’s always been wrong. More people really should try to learn about this. The AI will teach you about it if you ask! (Hinton is a socialist. youtube.com/shorts/R-b8RR6…)
YouTube video
YouTube
Stephen Pimentel@StephenPiment

It’s easy to dunk on Geoffrey Hinton for his 2016 declaration that it was “completely obvious” that radiologists would have no jobs within 5 years, while in fact, the number of radiologists has grown. But this prediction was more than a simple mistake. It’s a synedoche for the entire discourse of AI timelines and doom.

English
355
206
2.7K
1.8M
Pedro Domingos
Pedro Domingos@pmddomingos·
Breaking news: Microsoft is replacing all its products with a new AI suite called Microsoft Mess.
English
23
14
302
18.9K
Jef Newsom
Jef Newsom@jef·
@danveloper Claude’s your bro. He’s a genius, but he has a mix of early onset Alzheimer’s and dissociative identity disorder. Codex is the really good QA guy who you would never hang with outside of work.
English
0
1
1
452
Dan Woods
Dan Woods@danveloper·
I sort of load balance between Claude Code (Opus 4.6 - max effort) and Codex (GPT-5.4 - medium) based on whether I need more outside-the-box thinking (Claude) or more precision execution (Codex). Sometimes, I'll have Claude Code experiment with an idea and then hand it to Codex to maximize the implementation. Sometimes even ask them to optimize each other's changes. It works great. Anyway, Claude Code is what I mainly collaborate with on engineering tasks. I always start with Claude Code. But, today Anthropic had so many problems with API stability and something being off about the model. It was just making foolish mistakes, tried to overwride the internal python print to be able to flush writes, forgot to save checkpoints on an hours long training run (my bad, I've come to trust it too much)... I had to fire that agent and /compact. And I went to Codex and man has it gotten so good. Speed, precision, throughput... the fact that it can watch a log and comment about it in real time as the data streams, as opposed to Claude Code's lazy sleep 9600. I'm very impressed. I wish gpt-5.4 had a 1M context window.
English
7
0
11
4.3K
Uncle Bob Martin
Uncle Bob Martin@unclebobmartin·
The Slog. We all know about the slog. We've been postponing a bit architectural refactoring because we know it's going to be a slog. But eventually the pressure builds and we heave a great sigh and begin the long arduous process of making a thousand dangerous changes and running the test suite as often as possible. Along comes the AI and suddenly the slog doesn't seem like such a big problem anymore. We just tell the AI to slog through, and twenty minutes later it's done; and it's right! And so off we go, confident that slogs are relegated to an ancient past. We'll never have to slog again! And then comes some deep systematic flaw that we must correct. And the AI simply cannot deal with it without hours of constant babysitting and monitoring. And there we are, slogging again.
English
16
16
206
12.5K
Daniel Isaac
Daniel Isaac@danpacary·
New goal: 1T param inference MoE model on MacBook Pro Yes that’s 1 TRILLION here’s the deal. There are no rules.
English
9
1
148
9K
Jef Newsom
Jef Newsom@jef·
@grok when are you going to get a proper CLI like all of the cool kids?
English
1
0
0
18
Uncle Bob Martin
Uncle Bob Martin@unclebobmartin·
Democrat politicians are now stuck defending two very unpopular issues. The defunding of DHS, and the opposition to voter id. I'm not sure how they get out of this hotbox unscathed.
English
23
0
106
7.8K
Uncle Bob Martin
Uncle Bob Martin@unclebobmartin·
These deep analytic dives into systematic failures burn a _LOT_ of tokens. It really has to think hard to work through the issues. It barely finishes before compaction. This implies something I think we've all known. There are problems that are too complex for the context window to hold. Once a problem exceeds the context window, I'm not sure what would happen. My approach would be to subdivide the problem into chunks that the AI could write a report about, so that it's conclusions would be available after the compression. This, however, simply postpones the issue. The final implication is that there is an upper limit of complexity beyond which the AIs cannot go. This must be true of humans as well, though we don't have context windows per se. Perhaps this explains why physicists have been stymied for over a century by the incompatibility of QM and GR.
English
19
3
64
7.5K
Jef Newsom
Jef Newsom@jef·
It feels like some days Claude is a genius and other days he's mildly retarded. Still loveable, but frustrating.
English
0
0
0
121
Leader John Thune
Leader John Thune@LeaderJohnThune·
Starting today, we are going to have an important fight on the Senate floor. Polling shows broad support for all of the issues included in the SAVE America Act. But never underestimate Democrats’ ability to get on the wrong side of what the American people want.
English
6.7K
1.2K
8.6K
279.6K