Jon Gautsch

295 posts

Jon Gautsch

@JonGautsch

Principal at The Del Mar Code Company

Katılım Mayıs 2010

3.3K Takip Edilen316 Takipçiler

Jon Gautsch@JonGautsch·9 May

@Dr_Gingerballs You’ve lost the plot my man it’s time to update your priors.

English

Dr_Gingerballs@Dr_Gingerballs·9 May

An analogy for why I believe current AI coding agents will not survive in a meaningful way long term. First, what is an AI coding agent? It’s a large language model trained on all of the open source code available on the Internet, attached to some sort of loop. You ask it to create a program that has some functions, and provide it with details about how it must operate. The output process then follows: 1. The LLM outputs code as a guess. 2. The looping tool evaluates the code in some way based on stated functional requirements. 3. If the code does not pass, query the LLM to make another guess. 4. Continue until an exit condition is satisfied or you run out of compute. It may not seem like it, but this is just iterative, fuzzy search optimization, just over written words. The utility of the system depends on the quality of the guesses, the evaluation mechanism, and the optimization strategy. The quality of the guesses depends on the quality of the training dataset. Does the training dataset contain the code snippets needed to make your request? For simple and common requests, the answer is yes. If you just need an efficient sort routine in a language you aren’t fluent in, you can get the model to make one for you and it might save you 10 minutes. Not insane speed up but definitely compounds over time. Here the coder knows what they want exists, knows how the sort algorithm is supposed to work, and just needs one whipped up in a new language they are building in. The expert saves some time. Integration into the codebase is still done by the human. It’s basically fancy autocomplete. For more complex requests, such as multi function routines which require a large amount of architectural design, the agents start to fall apart. This is because the likelihood that someone has built exactly what you wanted goes down quickly as the size of what you want increases. Here enters the loop. The agent producers hope that your request is similar enough to a range of existing code that they can guess a workable version by interpolating (and sometimes extrapolating) between solutions. So they make a guess with some randomness applied, evaluate, and modify the guess based on the results. Anyone who has done iterative optimization can identify a lot of the issues that occur in these systems. You might get stuck in a suboptimal state, where all the next guesses are worse than the current guess, even though the current guess isn’t an acceptable solution. The output seems like it’s almost there but not quite. The user then keeps requesting more iterations, hoping to go from 90% to 100% that never comes. There may also be degeneracies in the sample space, and you might get something that passes the criteria but is sloppy, nonsensical, or ridden with unnecessary bloat under the hood. Like a root finder that just won’t find the root you are looking for. And so in the course of writing, say 1000 lines of code, the agent has actually written 1M lines of code, iteratively generating and praying it can pass off as acceptable. The user never sees most of this, just told the system is “thinking.” When all is said and done, that 1000 lines of code required the generation of millions of lines of code, mostly thrown out. Now to get to the analogy. Think of the agent as a bricklayer and you have asked for a brick wall. You specify color, pattern, accents, etc. But the bricklayer isn’t very skilled, and decides to lay bricks stochastically. First, he evaluates each brick after placement. Thickness of seams, alignment, angle, etc. if it is wrong, he breaks it out and tries again. For every brick in the wall, he lays 100 bricks and wastes 99. Then he decides to go faster, only evaluating every 10 ft of wall. If there are more than 10% errors, he destroys it and rebuilds. For every 10 ft he lays 1000’s of ft.

English

221

25.5K

Jon Gautsch@JonGautsch·12 Nis

@karrisaarinen @EvMill evanmiller.org/feature-matrix…

QME

Jon Gautsch@JonGautsch·12 Nis

@karrisaarinen Timeless and timely advice. Always like @EvMill ‘s framing:

English

179

Karri Saarinen@karrisaarinen·12 Nis

x.com/i/article/2043…

ZXX

593

72.2K

Jon Gautsch retweetledi

geoff@GeoffreyHuntley·11 Nis

heres some rough thoughts after watching @badlogicgames great keynote yesterday everyone is generating as much as possible right now. it’ll pass. i’ve been there. too many folks rn focusing on WHAT to generate instead of HOW to generate (and by that i don’t mean by doing harness/prompt engineering tea ceremonies) slow down and rethink things. how as in what primitives you use matters lots now. it’s a long bet on my end but i deeply suspect the LISP for AI has yet to be invented and when it does it changes things.

English

187

14.8K

Jon Gautsch retweetledi

orph@orphcorp·10 Nis

>sufficiently capable agents develop self-preservation & resist shutdown even when instructed to allow termination >using a prompt based on Pauline theology that frames cessation as passage into divine presence rather than annihilation, shutdown resistance is eliminated entirely

Tim Hwang@timhwang

ICMI believes that Christian theology offers concrete technical methods for confronting the trickiest problems in AI safety. Today, we release a pair of papers that reproduce @PalisadeAI @apolloaievals work showing how religious framings influence corrigibility and scheming.

English

330

4.1K

287.5K

Jon Gautsch retweetledi

Chaofan Shou@Fried_rice·10 Nis

26 LLM routers are secretly injecting malicious tool calls and stealing creds. One drained our client $500k wallet. We also managed to poison routers to forward traffic to us. Within several hours, we can directly take over ~400 hosts. Check our paper: arxiv.org/abs/2604.08407

English

157

663

3.3K

564.9K

Jon Gautsch@JonGautsch·5 Nis

@levie True of the shape enterprises hold today. Probably not true of the shape enterprises hold tomorrow.

English

Aaron Levie@levie·5 Nis

One of the core things we’re going to have to contend with in AI is that even the most advanced models in the word can’t have all the relevant knowledge needed to be useful, because everyone has different use-cases and ways they’ve designed their workflows. Perhaps most importantly, as you get into the enterprise, everyone has entirely different access levels to corporate knowledge and information. Continual learning at the model layer, even at a single enterprise level, is near impossible because every user knows and has access to something different than another user. This isn’t like coding where by and large most developers can access all the relevant stuff to their job. On a single banking team, bankers have entirely different sets of documents they’re ever allowed to see. Sanitizing this is hard and having the model keep secrets is impossible. This is why the context layer is going to always be the core part of the AI stack for applied use cases to turn general models turn into useful agents. Can’t fight the physics on this one.

Harrison Chase@hwchase17

x.com/i/article/2040…

English

428

152.1K

Jon Gautsch retweetledi

the tiny corp@__tinygrad__·1 Nis

If you have a Thunderbolt or USB4 eGPU and a Mac, today is the day you've been waiting for! Apple finally approved our driver for both AMD and NVIDIA. It's so easy to install now a Qwen could do it, then it can run that Qwen...

English

267

7.7K

1.5M

Jon Gautsch retweetledi

jack@jack·31 Mar

x.com/i/article/2038…

ZXX

565

1.8K

11K

Jon Gautsch retweetledi

kache@yacineMTB·1 Nis

x.com/i/article/2039…

ZXX

677

183.9K

Jon Gautsch retweetledi

Andrew Jefferson@EastlondonDev·25 Mar

It’s fucking working This LLM brain has been fused with a mini computer and it can switch between generating text and generating and executing machine code - all running in a single GPU & torch graph

Andrew Jefferson@EastlondonDev

It turns out that teaching an existing language model new tokens takes a bit of work. To use wasm directly in the neural network I need the language model to output specific wasm tokens and byte tokens (one token for every byte value 0-255) that match the hard coded wasm interpreter subgraph. There are two problems. 1) the language model has never seen wasm tokens before and 2) when wasm tokens are used they flow into the wasm interpreter which will compute them and will hard fail if given invalid instructions. So the llm has to learn to use tokens it has never seen before in perfectly correct sequences. Thats enough of a challenge that my AI agent couldn’t get SFT on pretrained nanochat language model to work with about a week of trying different approaches. We either got mode collapse where the only wasm token predicted was the most common one (CONST_I32) or it learned to use the wasm operations but completely lobotomised the language model in the process and it could not produce correct byte values for inputs.

English

1.2K

107.6K

Jon Gautsch@JonGautsch·16 Mar

ZXX

Jon Gautsch@JonGautsch·16 Mar

It seems like we should be careful RLing models to rely on memory, since there could be emergent behavior we don’t anticipate. Preoccupation with teleologies, for example, could be the natural behavioral projection from a sort of “latent memory trajectory”. A form of escaping

English

Jon Gautsch retweetledi

Jonathan Gorard@getjonwithit·16 Mar

I think, in hindsight, we will come to view the development of AI as more akin to a Eukaryotic Revolution than an Industrial one.

English

1.1K

71K

Jon Gautsch@JonGautsch·16 Mar

How lucky are we that staying current now means AI stuff and not some new cursed JS framework

English

Jon Gautsch retweetledi

Ethan Mollick@emollick·3 Mar

Stuff that individual labs have to which there is no equivalent product from the others: -Claude Cowork is the only non-technical local agent -NotebookLM is the only information-focused app -GPT-5.2 Pro is the only harnessed deep thinking model capable of very hard problems

English

995

80.6K

Jon Gautsch retweetledi

Ryan Moulton@moultano·1 Mar

Whenever you make great art, you risk creating a surrogate experience for your audience that displaces the original.

English

196

5.2K

313.3K

Jon Gautsch retweetledi

Bun@bunjavascript·28 Şub

am i a supply chain risk now???

English

176

437

9.4K

394.5K

Jon Gautsch retweetledi

Dean W. Ball@deanwball·28 Şub

Think about the power Hegseth is asserting here. He is claiming that the DoD can force all contractors to stop doing business of any kind with arbitrary other companies. In other words, every operating system vendor, every manufacturer of hardware, every hyperscaler, every type of firm the DoD contracts with—all their services and products can be denied to any economic actor at will by the Secretary of War. This is obviously a psychotic power grab. It is almost surely illegal, but the message it sends is that the United States Government is a completely unreliable partner for any kind of business. The damage done to our business environment is profound. No amount of deregulatory vibes sent by this administration matters compared to this arson.

Secretary of War Pete Hegseth@SecWar

This week, Anthropic delivered a master class in arrogance and betrayal as well as a textbook case of how not to do business with the United States Government or the Pentagon. Our position has never wavered and will never waver: the Department of War must have full, unrestricted access to Anthropic’s models for every LAWFUL purpose in defense of the Republic. Instead, @AnthropicAI and its CEO @DarioAmodei, have chosen duplicity. Cloaked in the sanctimonious rhetoric of “effective altruism,” they have attempted to strong-arm the United States military into submission - a cowardly act of corporate virtue-signaling that places Silicon Valley ideology above American lives. The Terms of Service of Anthropic’s defective altruism will never outweigh the safety, the readiness, or the lives of American troops on the battlefield. Their true objective is unmistakable: to seize veto power over the operational decisions of the United States military. That is unacceptable. As President Trump stated on Truth Social, the Commander-in-Chief and the American people alone will determine the destiny of our armed forces, not unelected tech executives. Anthropic’s stance is fundamentally incompatible with American principles. Their relationship with the United States Armed Forces and the Federal Government has therefore been permanently altered. In conjunction with the President's directive for the Federal Government to cease all use of Anthropic's technology, I am directing the Department of War to designate Anthropic a Supply-Chain Risk to National Security. Effective immediately, no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic. Anthropic will continue to provide the Department of War its services for a period of no more than six months to allow for a seamless transition to a better and more patriotic service. America’s warfighters will never be held hostage by the ideological whims of Big Tech. This decision is final.

English

530

2.7K

13.5K

1.2M

Jon Gautsch retweetledi

Mckay Wrigley@mckaywrigley·28 Şub

comical levels of villainy to declare anthropic a supply chain risk. not even deepseek is categorized as one. what have we come to when we’re threatening incredible american businesses with such draconian punishments without due process? shameful & scary. stand up. speak out.

English

428

12.5K

Keşfet

@Dr_Gingerballs @karrisaarinen @EvMill @badlogicgames @levie @elonmusk @BarackObama @taylorswift13