Jeff Preshing

2.6K posts

Jeff Preshing banner
Jeff Preshing

Jeff Preshing

@preshing

Canadian game developer

Toque weather Katılım Temmuz 2008
530 Takip Edilen4.3K Takipçiler
Jeff Preshing retweetledi
Matt Pocock
Matt Pocock@mattpocockuk·
Doing some experiments today with Opus 4.6's 1M context window. Trying to push coding sessions deep into what I would consider the 'dumb zone' of SOTA models: >100K tokens. The drop-off in quality is really noticeable. Dumber decisions, worse code, worse instruction-following. Don't treat 1M context window any differently. It's still 100K of smart, and 900K of dumb.
English
138
53
1.1K
128.1K
Jeff Preshing
Jeff Preshing@preshing·
@jackclarkSF @deredleritt3r I see AI systems as inference engines that require humans to design, operate and direct them, so I'm not confused at all. If a theory/insight is consistent with existing knowledge, then it should be possible to build an AI system to generate it, but that's a human endeavor.
English
0
0
1
204
Jack Clark
Jack Clark@jackclarkSF·
Just to operationalize this, using the MOLG definition: "Smarter than a nobel prize winner" - huge error bars? 20%?*** Has same interfaces as a human working virtually - 100%; this is true today given plugins/composite systems. Tasks that takes hours/days/weeks - 80%, if METR fast trend holds. Can control existing tools - I feel uncalibrated here and don't think especially important. More depends on things having APIs and many things do. Resources can be repurposed to run many copies - 100%; this is true today. ***This is the biggest variable imo, and it's really confusing. In the past couple of months there is now evidence of AI helping humans co-develop solutions at frontier of science in bio, math, physics, etc. And many experts are increasingly impressed/surprised by capabilities. HOWEVER I think no AI system has yet had a simple and rebellious insight on par with stuff like coming up with general relativity, CRISPR, etc. This may just be a function of AIs not being given enough opportunities to do open-ended agent-led experimentation. Or it could be something deeper - maybe these systems lack some quality that allows for the outrageous and inspiring creativity of humans that change the paradigm.
English
10
17
202
36K
prinz
prinz@deredleritt3r·
Jack Clark continues to believe that "powerful AI" is achievable *this year* and "running many copies" in 2027.
Jack Clark@jackclarkSF

@chatgpt21 yes

English
8
14
349
44.7K
Jeff Preshing
Jeff Preshing@preshing·
@emollick How do we stop people from doing bad things? With laws. How do we deal with bugs caused by using LLMs? Sandboxing, testing, observability and support — same as every other software component. What further justification are you looking for?
English
0
0
1
174
Ethan Mollick
Ethan Mollick@emollick·
I get the enthusiasm for open weights models but what is the justification from an alignment perspective? And here I mean both narrow alignment (models being used improperly to do l bad things) and bigger alignment (agents or even AGI that are not aligned with humans overall).
English
50
2
134
26.3K
Jeff Preshing
Jeff Preshing@preshing·
"It's more of a observational stress test than a hard validator — it deliberately creates messy allocation patterns and logs what happens so you can watch fragmentation behavior. It has a list of target totals (in KB): 400 → 100 → 2000 → 400 → 5000 … → 0. For each target it grows or shrinks the total allocated bytes irregularly: Grow phase: alloc → free random → alloc (repeat until target). Shrink phase: free random → alloc small → free random. Allocation sizes are randomized: 10 % chance: large (100–400 KB) 25 % chance: medium (5–15 KB) 65 % chance: small (10–509 bytes) After every operation it logs: operation count, current user-allocated bytes, and Heap::getStats().totalSystemMemoryUsed. No validate(), no coalescing assertions, no pass/fail — you just watch the logs. If system memory keeps growing even when user-allocated bytes drop, you have external fragmentation."
English
0
0
0
25
Jeff Preshing
Jeff Preshing@preshing·
The agent ran the fragmentation test several times and corrected its own mistakes. At one point it tried to use gdb but couldn't, so it modified PLY_ASSERT to log the file/line instead, which I have to admit is a better default behavior. All can be seen in the transcript. Hey @grok - any chance you could explain how `apps/fragmentation-test` works in the linked repo?
English
1
0
1
42
Jeff Preshing
Jeff Preshing@preshing·
😲 Wow! Codex 5.3 wrote a complete, general-purpose C++ memory allocator for me in just 30 minutes. Bada bing bada boom. The code is clearly written, well-documented, efficient, handles fragmentation well and stands up to a battery of tests. I was able to submit the AI-generated work directly to the main branch with no additional changes on my part. Of course, if you want the full story, I should also mention that I spent several days preparing the workspace, designing the fragmentation test, customizing the AGENTS file and iterating on the prompt in addition to those 30 minutes. But I still find it very cool. Using a powerful LLM is like using a fax machine to get the answer back from a parallel universe where the remaining work has already been completed. 📠 For anyone interested, the prompt used can be seen in the commit description: github.com/preshing/plywo…
English
17
17
193
23.9K
Jeff Preshing
Jeff Preshing@preshing·
Another fun followup: I ran the same prompt entirely on local hardware using a 4-bit quant of Qwen3.5:122b, another open weight model released two days ago. It worked for an hour and 20 minutes, producing code and documentation that were remarkably close to what I asked for but ignoring the project's coding style. Then it declared success even though the test suite actually crashed. It told me that this was "due to a pre-existing issue," but it was actually due to a huge memory leak in the code it wrote. Still pretty impressive considering it ran locally. I wonder if some coaching would help it iron the bugs out. Transcript available here: github.com/preshing/plywo…
English
0
0
3
780
Jeff Preshing
Jeff Preshing@preshing·
As an experiment, I ran the same prompt using Minimax-M2.5, a popular open weight model. It worked for 40 minutes, wrote a 300-line allocator that only implemented small bins, then gave up saying, "A full-featured allocator would require significantly more work to handle all edge cases correctly." So I ran the same prompt again and it added some code to support tree bins. The code looks reasonable, but now it's spiraling trying to fix build errors.
English
1
0
9
2.2K
Máté Béres
Máté Béres@mateberes_·
@preshing I stopped reading at the global mutex in the allocator.
English
3
0
26
1.3K
Jeff Preshing
Jeff Preshing@preshing·
Git finally has an easy-to-use command line interface: pi -p "Rename remote origin to local"
English
0
1
5
1.7K
Jeff Preshing
Jeff Preshing@preshing·
Having fun using Codex 5.3 ($20 plan) to improve my C++ Markdown parser. From 277 to 402 passing test cases since yesterday. AI really shines at this kind of work! I'm directing the effort and playing code janitor, but it even helps with those things too. github.com/preshing/plywo…
English
1
0
9
1.1K
Jeff Preshing
Jeff Preshing@preshing·
"AI systems amplify well-structured knowledge while punishing undocumented systems." 🎯 This sums up well how software development is changing, I think. Well-organized projects with clear documentation and legible code will have a big advantage. It isn't obvious how granular the documentation should be, or how to best feed information to agents at the right time, but success will favor teams who get it right. I hope to see more discussion about best practices going forward. Thanks @clattner_llvm for sharing your perspective and expertise.
Chris Lattner@clattner_llvm

The Claude C Compiler is the first AI-generated compiler that builds complex C code, built by @AnthropicAI. Reactions ranged from dismissal as "AI nonsense" to "SW is over": both takes miss the point. As a compiler🐉 expert and experienced SW leader, I see a lot to learn: 👇

English
0
1
8
1.3K
Jeff Preshing
Jeff Preshing@preshing·
This is the one I'm currently using in my open source project: github.com/preshing/plywo… It's enough to make the frontier models use my API pretty well, but weaker models like qwen3-coder-next still keep pulling standard C functions in. I'm wondering if being more explicit will help, like "If you need to do X, use Y."
English
0
0
1
132
Jarkko Lempiäinen
Jarkko Lempiäinen@JarkkoPFC·
@preshing Devs who don't care about performance, memory usage, maintainability and other pesky NFRs, AI coding is working perfectly 😉 Anyway, I'll be testing how teaching AI agent via agents file will work and if I can significantly improve the code quality AI produces.
English
1
0
2
243
Jeff Preshing
Jeff Preshing@preshing·
I'm convinced that if you're building a large piece of commercial software, the only good way to use AI coding agents is as a power tool, without forsaking your ability to understand the code. On the other hand, if you're building disposable software — for personal use, for a demo, or just experimenting — then abandoning understanding of the code is totally fine. If the project is simple, the AI-generated code will be easy to understand anyway. I see many takes claiming the opposite. People are saying that soon, understanding the code won't even matter. The logic goes: AI is improving all the time, therefore, it will eventually just do everything without human intervention. Some people are already trying to live in this imagined future: Running Ralph loops, getting agents to supervise other agents, cranking out hundreds of commits per day. More power to them, but I don't plan to work that way. Of course AI will keep improving. In the best case, those improvements will lead us towards greater simplicity. They'll help untangle spaghetti code, suggest better ways to organize the project, generate clear, up-to-date documentation. Projects will become easier to understand and onboarding will become easier for both humans and AI alike. This is the only trend that makes sense to me. I can imagine that in 20 years or so, software engineering will no longer be a glamour job. Development tools will be more intuitive. The opportunities to get rich overnight will fade. We'll still need programmers and system maintainers, but not as many. But I think it will take time.
English
3
5
20
1.8K