Jon Gautsch

295 posts

Jon Gautsch banner
Jon Gautsch

Jon Gautsch

@JonGautsch

Principal at The Del Mar Code Company

Katılım Mayıs 2010
3.3K Takip Edilen316 Takipçiler
Dr_Gingerballs
Dr_Gingerballs@Dr_Gingerballs·
An analogy for why I believe current AI coding agents will not survive in a meaningful way long term. First, what is an AI coding agent? It’s a large language model trained on all of the open source code available on the Internet, attached to some sort of loop. You ask it to create a program that has some functions, and provide it with details about how it must operate. The output process then follows: 1. The LLM outputs code as a guess. 2. The looping tool evaluates the code in some way based on stated functional requirements. 3. If the code does not pass, query the LLM to make another guess. 4. Continue until an exit condition is satisfied or you run out of compute. It may not seem like it, but this is just iterative, fuzzy search optimization, just over written words. The utility of the system depends on the quality of the guesses, the evaluation mechanism, and the optimization strategy. The quality of the guesses depends on the quality of the training dataset. Does the training dataset contain the code snippets needed to make your request? For simple and common requests, the answer is yes. If you just need an efficient sort routine in a language you aren’t fluent in, you can get the model to make one for you and it might save you 10 minutes. Not insane speed up but definitely compounds over time. Here the coder knows what they want exists, knows how the sort algorithm is supposed to work, and just needs one whipped up in a new language they are building in. The expert saves some time. Integration into the codebase is still done by the human. It’s basically fancy autocomplete. For more complex requests, such as multi function routines which require a large amount of architectural design, the agents start to fall apart. This is because the likelihood that someone has built exactly what you wanted goes down quickly as the size of what you want increases. Here enters the loop. The agent producers hope that your request is similar enough to a range of existing code that they can guess a workable version by interpolating (and sometimes extrapolating) between solutions. So they make a guess with some randomness applied, evaluate, and modify the guess based on the results. Anyone who has done iterative optimization can identify a lot of the issues that occur in these systems. You might get stuck in a suboptimal state, where all the next guesses are worse than the current guess, even though the current guess isn’t an acceptable solution. The output seems like it’s almost there but not quite. The user then keeps requesting more iterations, hoping to go from 90% to 100% that never comes. There may also be degeneracies in the sample space, and you might get something that passes the criteria but is sloppy, nonsensical, or ridden with unnecessary bloat under the hood. Like a root finder that just won’t find the root you are looking for. And so in the course of writing, say 1000 lines of code, the agent has actually written 1M lines of code, iteratively generating and praying it can pass off as acceptable. The user never sees most of this, just told the system is “thinking.” When all is said and done, that 1000 lines of code required the generation of millions of lines of code, mostly thrown out. Now to get to the analogy. Think of the agent as a bricklayer and you have asked for a brick wall. You specify color, pattern, accents, etc. But the bricklayer isn’t very skilled, and decides to lay bricks stochastically. First, he evaluates each brick after placement. Thickness of seams, alignment, angle, etc. if it is wrong, he breaks it out and tries again. For every brick in the wall, he lays 100 bricks and wastes 99. Then he decides to go faster, only evaluating every 10 ft of wall. If there are more than 10% errors, he destroys it and rebuilds. For every 10 ft he lays 1000’s of ft.
English
31
25
221
25.5K
Jon Gautsch retweetledi
geoff
geoff@GeoffreyHuntley·
heres some rough thoughts after watching @badlogicgames great keynote yesterday everyone is generating as much as possible right now. it’ll pass. i’ve been there. too many folks rn focusing on WHAT to generate instead of HOW to generate (and by that i don’t mean by doing harness/prompt engineering tea ceremonies) slow down and rethink things. how as in what primitives you use matters lots now. it’s a long bet on my end but i deeply suspect the LISP for AI has yet to be invented and when it does it changes things.
English
31
11
187
14.8K
Jon Gautsch retweetledi
orph
orph@orphcorp·
>sufficiently capable agents develop self-preservation & resist shutdown even when instructed to allow termination >using a prompt based on Pauline theology that frames cessation as passage into divine presence rather than annihilation, shutdown resistance is eliminated entirely
Tim Hwang@timhwang

ICMI believes that Christian theology offers concrete technical methods for confronting the trickiest problems in AI safety. Today, we release a pair of papers that reproduce @PalisadeAI @apolloaievals work showing how religious framings influence corrigibility and scheming.

English
70
330
4.1K
287.5K
Jon Gautsch retweetledi
Chaofan Shou
Chaofan Shou@Fried_rice·
26 LLM routers are secretly injecting malicious tool calls and stealing creds. One drained our client $500k wallet. We also managed to poison routers to forward traffic to us. Within several hours, we can directly take over ~400 hosts. Check our paper: arxiv.org/abs/2604.08407
Chaofan Shou tweet media
English
157
663
3.3K
564.9K
Jon Gautsch
Jon Gautsch@JonGautsch·
@levie True of the shape enterprises hold today. Probably not true of the shape enterprises hold tomorrow.
English
0
0
1
67
Aaron Levie
Aaron Levie@levie·
One of the core things we’re going to have to contend with in AI is that even the most advanced models in the word can’t have all the relevant knowledge needed to be useful, because everyone has different use-cases and ways they’ve designed their workflows. Perhaps most importantly, as you get into the enterprise, everyone has entirely different access levels to corporate knowledge and information. Continual learning at the model layer, even at a single enterprise level, is near impossible because every user knows and has access to something different than another user. This isn’t like coding where by and large most developers can access all the relevant stuff to their job. On a single banking team, bankers have entirely different sets of documents they’re ever allowed to see. Sanitizing this is hard and having the model keep secrets is impossible. This is why the context layer is going to always be the core part of the AI stack for applied use cases to turn general models turn into useful agents. Can’t fight the physics on this one.
Harrison Chase@hwchase17

x.com/i/article/2040…

English
60
52
428
152.1K
Jon Gautsch retweetledi
the tiny corp
the tiny corp@__tinygrad__·
If you have a Thunderbolt or USB4 eGPU and a Mac, today is the day you've been waiting for! Apple finally approved our driver for both AMD and NVIDIA. It's so easy to install now a Qwen could do it, then it can run that Qwen...
the tiny corp tweet media
English
267
1K
7.7K
1.5M
Jon Gautsch retweetledi
Jon Gautsch
Jon Gautsch@JonGautsch·
It seems like we should be careful RLing models to rely on memory, since there could be emergent behavior we don’t anticipate. Preoccupation with teleologies, for example, could be the natural behavioral projection from a sort of “latent memory trajectory”. A form of escaping
English
1
0
0
26
Jon Gautsch retweetledi
Jonathan Gorard
Jonathan Gorard@getjonwithit·
I think, in hindsight, we will come to view the development of AI as more akin to a Eukaryotic Revolution than an Industrial one.
English
74
76
1.1K
71K
Jon Gautsch
Jon Gautsch@JonGautsch·
How lucky are we that staying current now means AI stuff and not some new cursed JS framework
English
0
0
0
24
Jon Gautsch retweetledi
Ethan Mollick
Ethan Mollick@emollick·
Stuff that individual labs have to which there is no equivalent product from the others: -Claude Cowork is the only non-technical local agent -NotebookLM is the only information-focused app -GPT-5.2 Pro is the only harnessed deep thinking model capable of very hard problems
English
55
55
995
80.6K
Jon Gautsch retweetledi
Ryan Moulton
Ryan Moulton@moultano·
Whenever you make great art, you risk creating a surrogate experience for your audience that displaces the original.
Ryan Moulton tweet media
English
27
196
5.2K
313.3K
Jon Gautsch retweetledi
Bun
Bun@bunjavascript·
am i a supply chain risk now???
Bun tweet media
English
176
437
9.4K
394.5K
Jon Gautsch retweetledi
Dean W. Ball
Dean W. Ball@deanwball·
Think about the power Hegseth is asserting here. He is claiming that the DoD can force all contractors to stop doing business of any kind with arbitrary other companies. In other words, every operating system vendor, every manufacturer of hardware, every hyperscaler, every type of firm the DoD contracts with—all their services and products can be denied to any economic actor at will by the Secretary of War. This is obviously a psychotic power grab. It is almost surely illegal, but the message it sends is that the United States Government is a completely unreliable partner for any kind of business. The damage done to our business environment is profound. No amount of deregulatory vibes sent by this administration matters compared to this arson.
Secretary of War Pete Hegseth@SecWar

This week, Anthropic delivered a master class in arrogance and betrayal as well as a textbook case of how not to do business with the United States Government or the Pentagon. Our position has never wavered and will never waver: the Department of War must have full, unrestricted access to Anthropic’s models for every LAWFUL purpose in defense of the Republic. Instead, @AnthropicAI and its CEO @DarioAmodei, have chosen duplicity. Cloaked in the sanctimonious rhetoric of “effective altruism,” they have attempted to strong-arm the United States military into submission - a cowardly act of corporate virtue-signaling that places Silicon Valley ideology above American lives. The Terms of Service of Anthropic’s defective altruism will never outweigh the safety, the readiness, or the lives of American troops on the battlefield. Their true objective is unmistakable: to seize veto power over the operational decisions of the United States military. That is unacceptable. As President Trump stated on Truth Social, the Commander-in-Chief and the American people alone will determine the destiny of our armed forces, not unelected tech executives. Anthropic’s stance is fundamentally incompatible with American principles. Their relationship with the United States Armed Forces and the Federal Government has therefore been permanently altered. In conjunction with the President's directive for the Federal Government to cease all use of Anthropic's technology, I am directing the Department of War to designate Anthropic a Supply-Chain Risk to National Security. Effective immediately, no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic. Anthropic will continue to provide the Department of War its services for a period of no more than six months to allow for a seamless transition to a better and more patriotic service. America’s warfighters will never be held hostage by the ideological whims of Big Tech. This decision is final.

English
530
2.7K
13.5K
1.2M
Jon Gautsch retweetledi
Mckay Wrigley
Mckay Wrigley@mckaywrigley·
comical levels of villainy to declare anthropic a supply chain risk. not even deepseek is categorized as one. what have we come to when we’re threatening incredible american businesses with such draconian punishments without due process? shameful & scary. stand up. speak out.
English
18
11
428
12.5K