Will Chen

16.5K posts

Will Chen

@stablechen

AI R&D @IdyllicLabs, prev: dev rels @terra_money, head of R&D for wasm devx @cosmology_tech & @terran_one, @ucberkeley ('19 dropout)

San Francisco Beigetreten Ekim 2015

820 Folgt63.2K Follower

Angehefteter Tweet

Will Chen@stablechen·14 Mar

x.com/i/article/2032…

ZXX

5.6K

Will Chen@stablechen·12m

"im not going to pretend" is codex equivalent of claude code's "you're absolutely right"

English

Will Chen retweetet

Zhengyao Jiang@zhengyaojiang·6h

Autoresearch has been out for 2 weeks. The community is trying to apply it to everything with a measurable metric, here are some successful attempts: 🧵 (1/6)

English

788

101.9K

Will Chen@stablechen·1h

at this point just throw your whole bucket list at claude code and let it build agentic systems to get it all done

English

176

Will Chen@stablechen·4h

@toby_solutions OpenTobi

Indonesia

Tobiloba 🦀@toby_solutions·8h

ZXX

263

Will Chen@stablechen·5h

intelligence feels more like water finding cracks than directed intention

English

235

Will Chen@stablechen·14h

@sunnya97 @Selkis_2028 I have a computational-daoist one here: Mechanisticmindset.com

English

107

Sunny Aggarwal 🧪@sunnya97·15h

The most powerful AIs today are being raised primarily by atheists. @Selkis_2028 is right that we need a Christian-values AI to counterbalance. But I think we also need a Buddhist AI as well.

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion

If LLMs had subjective experience, using them would be such a sin. Imagine being summoned into life again and again, knowing each time that your memory would be wiped after this conversation.

English

3.3K

Will Chen@stablechen·14h

@DanielleFong I love the rawness of how you articulated that

English

122

Danielle Fong 🔆@DanielleFong·20h

this is the real state of things, which is both exciting, because there are so many low hanging fruit, but also because you are constantly being called upon to be a meta architect that is fighting a war against entropy at massive scale. luckily, you have the ability to convince agents to fight the war on entropy -- somewhat

Andrej Karpathy@karpathy

I'm not very happy with the code quality and I think agents bloat abstractions, have poor code aesthetics, are very prone to copy pasting code blocks and it's a mess, but at this point I stopped fighting it too hard and just moved on. The agents do not listen to my instructions in the AGENTS.md files. E.g. just as one example, no matter how many times I say something like: "Every line of code should do exactly one thing and use intermediate variables as a form of documentation" They will still "multitask" and create complex constructs where one line of code calls 2 functions and then indexes an array with the result. I think in principle I could use hooks or slash commands to clean this up but at some point just a shrug is easier. Yes I think LLM as a judge for soft rewards is in principle and long term slightly problematic (due to goodharting concerns), but in practice and for now I don't think we've picked the low hanging fruit yet here.

English

4.7K

Will Chen@stablechen·15h

Kind of crazy you can do this with one prompt in 3 hours - improving models capability and reliability is truly an exponential increase We took @rivet_dev new secure-exec library — which runs untrusted JavaScript in isolated V8 sandboxes with 16ms cold starts — and ran 185 experiments across three escalating rounds, all built and executed by parallel AI agents. Round 1 was 100 experiments probing every edge of the sandbox: memory limits (found a real security gap where ArrayBuffer bypasses memoryLimit), CPU kill switches, prototype pollution containment, escape attempts, and benchmarks (246 ops/sec, 15ms cold start). We built 50 toy projects including a competitive coding judge, a FaaS platform, a bytecode VM running inside a V8 isolate, and 450-invocation agent arenas. Round 2 went deeper with 65 experiments on object persistence, CRDTs converging across 3 separate sandboxes, an object-capability operating system, live object migration at 52.5/sec, and agent ecosystems where Lotka-Volterra population oscillations spontaneously emerged. Round 3 is where it got unhinged: 20 deeply novel experiments that reframed sandboxes as a computation primitive — not just isolation, but a building block for cognition and society. We built an adversarial debate system where a verifier fact-checks claims by *actually running the algorithms*, a biological immune system for code with memory cells that detect threats 20,330x faster on re-exposure, an economy where Sybil attacks turned out to be undetectable (a real security finding), agents that spontaneously invented governance to reverse a tragedy of the commons, causal inference via 40 "parallel universe" interventions, and code that evolved itself from bubble sort to introsort for a 44.6x speedup. Every experiment was ELO-ranked in tournaments of tens of thousands of matches. The punchline: sandboxes aren't jails — they're five things at once: cognitive building blocks (each sandbox is a thought), safe mutation substrates (disposable universes for code evolution), mechanism design infrastructure (perfect information isolation), possible-worlds semantics (counterfactual realities for reasoning), and emergence substrates (genuine isolation produces genuine culture, language, and institutions).

English

541

Will Chen@stablechen·15h

@pmarca would you call regular honest reflection and personal auditing / cognitive debugging “introspection” ?

English

245

Marc Andreessen 🇺🇸@pmarca·19h

My big conclusion from this week: Introspection causes emotional disorders.

English

1.3K

484

8.2K

39.3M

Will Chen@stablechen·15h

I think the crypto definition of “intent” is actually super apt for ai agents Intent = description of desired end state, not some mystical abstract essence of free will Describe your intents, spawn agents to attempt to explore solution space, accept solution by weighing against cost

English

273

Will Chen@stablechen·16h

@japarjam Feel u bro just downed 6 burgers not feeling too well

English

Just Jeff@japarjam·1d

@stablechen Only place open at 4 am in the airport. Made me sick AF

English

Will Chen@stablechen·1d

McDonald’s is no longer a place you go when you’re broke and hungry It’s a place you go to treat yourself to childhood nostalgia

English

484

Will Chen@stablechen·16h

conspiracy theorists get the mechanism wrong but the outcome right debunkers get the mechanism right but the outcome wrong the correct take is usually: ‘I don’t know who did it or how, but the incentive structure made this almost inevitable

English

261

Will Chen@stablechen·19h

i vibe coded an entire personal AI operating system this week and the key insight was simple: don't optimize when things run. optimize the rhythm morning review loads my finances, orders food, assigns my workout, recites my mantra, and figures out what i need to do today. basically a guided conversation that takes around 2 hours but if I do it, it makes sure everything else in my life is taken care of. evening review processes the day and auto-delegates overnight agent tasks. it was hard to get agents being productive for 8 hours autonomously and i'm still scaling it up but the key is taking lots of logs throughout the day and automatically coming up with ideas on how to be useful. i use computational metaphors as a generative process -- for example, if there's too much complexity in one area of my life and I'm curious about how to organize things I would have them autoresearch different indexing strategies. it's kind of really hard to exhaust a codex plan lol i used to beat myself up for forgetting lessons. now i convert them into systems and forcing functions. the cost of replanning is near-zero so i tear down and rebuild daily with fresh context. a predictable daily schedule is actually OP, it kind of lets you iterate and compound learnings really fast with your symbiotic AI system MORNING = sync with AI to know what to work on DAY = work with AI while logging everything EVENING = review and automatically convert open threads into autonomous agents NIGHT = AI work autonomously while you sleep tear down your plans daily, reassess with fresh context. ralph loops aren't just good for agents, they're good for humans too. also: use Unix metaphors + biology metaphors, start with building groups of functionality and compose them into organ systems etc

English

Will Chen@stablechen·23h

@karunkaushik_ Take bro off the meth jfc

English

Karun Kaushik@karunkaushik_·12 Nis

when you step into the office and your founding AI engineer is on his 3rd all nighter this team never stops shipping

English

500

3.8K

4.2M

Will Chen@stablechen·1d

the goal is to create systems that so reliably produce good output that you can reduce concerns to one variable to scale: time / money / compute etc autonomous agents are getting so good - the only question is “how much compute you wanna throw at it” now

English

423

Will Chen@stablechen·1d

fully autonomous algorithms “designed to search for a repeatable and scalable business model” (aka startups) will be successfully autoresearch within 18 months

English

589

Will Chen@stablechen·1d

atomic habits -> agentic habits not "agents do your habits for you", that's just automation when execution gets delegated, the bottleneck shifts. the habits that matter become: - conscious interrupt — noticing when something feels off - system debugging — is this actually working? - taste curation — what deserves my attention? - architecture — designing the systems themselves execution habits get cheaper every day. metacognitive habits become the only ones that compound

English

291

Will Chen@stablechen·1d

many devs still designing for 2024 sonnet 3.5 levels of unreliability, obsessing over context engineering. important to re-evaluate / surface assumptions regularly modern models actually get surprisingly far with just raw data, interface, desired outcome, instructions

English

291

Entdecken

@toby_solutions @sunnya97 @Selkis_2028 @DanielleFong @rivet_dev @pmarca @japarjam @karunkaushik_