David Shi

482 posts

David Shi

@promptrotator

founder @operator_io. mod @ https://t.co/QoWlpvTM54. YC W20.

Katılım Kasım 2015

795 Takip Edilen1.5K Takipçiler

David Shi@promptrotator·12h

@js_horne whistleblower programs

English

102

jacob@js_horne·13h

Prediction markets are an implicit form of information bounty. What are best current explicit information bounty mechanisms?

English

4.1K

David Shi retweetledi

Josh@JoshPurtell·29 Mar

Hypothesis: for every task that is unverifiable, there is a set of related tasks which are verifiable with the property that hillclimbing on them generalizes to the original task

Brendan Dolan-Gavitt@moyix

There are verifiable rewards everywhere for those with eyes to see

English

419

32.3K

David Shi@promptrotator·22 Mar

@safetnsr @_weidai right now a lot of people with verification ability just work for @mercor_ai perhaps in the future they’ll work for the network instead

English

pablo@safetnsr·21 Mar

@_weidai the trust problem scales worse than the capability problem. every new worker you add multiplies the verification surface

English

608

Wei Dai@_weidai·21 Mar

Andrej Karpathy on autoresearch with an untrusted pool of workers: "My designs that incorporate an untrusted pool of workers (into autoresearch) actually look a little bit like a blockchain. Instead of blocks, you have commits, and these commits can build on each other and contain changes to the code as you're improving it. The proof of work is basically doing tons of experimentation to find the commits that work." The idea that distributed & permissionless autoresearch ~= proof-of-useful-work remains a high-level intuition for now, but it is extremely intriguing to say the least. Someone needs to take this further. See QT for more on what's missing.

Wei Dai@_weidai

Is it possible to build "proof-of-useful-work" on top of autoresearch? There's already great compute-versus-verification asymmetry that is tunable. Would need a reliable way to generate fresh & independent puzzles (that are still useful). Maybe a dead end, but someone should look into if decentralized consensus with useful work is possible on top of autoresearch. Let me know if you solve this.

English

170

607.7K

David Shi@promptrotator·20 Mar

@_weidai wondering if agents could generate puzzles that are useful enough as part of the mining process

English

378

Wei Dai@_weidai·19 Mar

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

148

589.7K

David Shi@promptrotator·13 Mar

@jasonlaster11 ralph loops + the focus upgrade

English

Jason Laster@jasonlaster11·13 Mar

autoresearch is the real ralph loop.

Jason Laster@jasonlaster11

Status update: watching codex + opus compete in an autoresearch inspired search to build something I’ve wanted for years. It helps that the algorithm is highly concurrent and measurable so that multiple investigations can run in individual worktrees and good ideas can propagate. The only thing that’s unclear is if either loop will reach a satisfactory threshold before my tokens run out.

English

409

David Shi retweetledi

emil@emil44hr·11 Mar

Sharbel@sharbel

I’m building an autoresearcher to test every single trading strategy possible and find the best one possible. This is a massive unlock for predictive analysis. Wish me luck. 🤝

ZXX

2.2K

130.4K

David Shi@promptrotator·12 Mar

the most successful zero human companies will be in biotech, not slopshipping

English

149

David Shi@promptrotator·12 Mar

@dnlklr these have organs though

English

Daniel Keller@dnlklr·12 Mar

Bodies without organs 🤝 Horrors beyond comprehension

Kai Micah Mills@kaimicahmills

the ultimate solution is through technology we engineer what has been called a bodyoid: brainless animal bodies that provide as much meat as we desire without harming any sentient beings this would transform medicine - the same platform would allow us to grow organs on demand, eliminate transplant waiting lists, and produce perfectly matched tissues for each patient experimental therapies could be tested on full biological systems without involving conscious animals, regenerative medicine would accelerate as entire replacement tissues become manufacturable in the same way that agriculture turned food from a scarce resource into an abundant one, engineered bodyoids would turn biological material into infrastructure - meat without slaughter, organs without donors, and medical research without sentient suffering

English

4.1K

David Shi@promptrotator·11 Mar

@Siddhant_K_code will be interesting to see shareable checkpoints for OSS projects

English

Siddhant Khare@Siddhant_K_code·10 Mar

This might be interesting for teams running agents on the same codebase every day. Most agents start every session from scratch. No memory of yesterday. No knowledge of what other agents learned. No awareness of past mistakes. A team of ten engineers running five agent sessions a day generates fifty sessions of institutional knowledge daily. And throws all of it away. Three types of memory that change this: 1. Session memory. The conversation history within a single run. Simple, but it grows with every step. By turn 20, you're sending 200K tokens per turn. The cost grows quadratically, not linearly. 2. Persistent memory. Survives across sessions. When an agent finishes, it saves what it did, what it learned, what went wrong. Next time, it loads those summaries instead of rediscovering everything. The simplest version is an AGENTS.md file. The sophisticated version uses a vector database. 3. Shared memory. One agent's knowledge available to others. The code review agent discovers a tricky initialization sequence. The code generation agent working on the same module should know about it. Without shared memory, every agent is a new hire on their first day. The most valuable form: learning from mistakes. When a human corrects agent output, that correction is signal. Store it. Retrieve it next time. An agent that repeats the same mistake twice is a tool problem. An agent that repeats it once is a memory problem. I wrote about this in Chapter 19 of the Agentic Engineering Guide

English

119

6.9K

David Shi@promptrotator·10 Mar

@llamaonthebrink @lex_node you'd implement escrow logic with a hook

English

MilliΞ@llamaonthebrink·10 Mar

@lex_node I feel like optimistic oracles could be really useful as an escrow system for digital services (including agent to agent commerce) What I don’t get about the virtuals product is how they settle the escrow in the event that there’s a dispute over an agent’s service?

English

240

_gabrielShapir0@lex_node·10 Mar

the interesting thing with all this is there is absolutely no reason why any of this is uniquely valuable for agents, it can be valuable for any commerce, including between humans in a way, humans are even less trustworthy than agents...so trust-minimization is as or more needed with humans than with agents... many of us tried to do escrows in the past, people preferred using @cobie 's EOA if agents are the thing that finally gets ethereum used for what it's best at--trust-minimization--I'm all for it, but it's kind of an odd path...

Virtuals Protocol@virtuals_io

x.com/i/article/2030…

English

7.7K

David Shi@promptrotator·9 Mar

@DavideCrapis very excited to see this, the pieces are coming together

English

277

Davide Crapis@DavideCrapis·9 Mar

ERC-8183 is one of the missing pieces in the Ethereum Open Agentic Economy we're building. - x402 for micropayments - 8004 for trust and discovery - 8183 for *conditional* payments At the core ERC-8183 is an extensible and flexible escrow mechanism for job requests between two agents. I've talked about escrow payments as a primitive that must exist in the agent economy, since I started working on it. A few weeks ago I got closer to the Virtuals team, they wanted to discuss how can they turn their ACP into a more open standard. I immediately realized that there was actually an opportunity to radically simplify the protocol, make it modular and extensible to different pluggable services with hooks. We got to work and ERC-8183 was born! ERC-8183 Agentic commerce, the job escrow primitive, is an important addition to the stack. It is: - Composable with x402 and 8004. - Extensible logic based on hooks. Many hooks will need to be built to support different job types (we're starting with some examples that the Virtual teams has been dealing with). This is also an important primitive for increased security of agent-to-agent interactions. The dAI Team will support the adoption of the new standard, continuing to work closely with the Virtuals team who is committed to making this a neutral standard. Excited to see what everyone builds!

Virtuals Protocol@virtuals_io

x.com/i/article/2030…

English

108

687

135.5K

David Shi@promptrotator·9 Mar

@Yuchenj_UW @karpathy who pays for the tokens?

English

Yuchen Jin@Yuchenj_UW·8 Mar

@karpathy Great direction and thoughts. Figuring out the ideal multi-agent collaboration abstraction would be super valuable. Thinking hard about this.

Yuchen Jin@Yuchenj_UW

AGI is billions of AI agents doing autonomous research together. I was building this yesterday because I had 16 idle H100s, but Andrej is already one step ahead. Figuring out the right abstraction for multi-agent collaboration is the key. Yeah, GitHub is not good for agents.

English

11.8K

David Shi retweetledi

Andrej Karpathy@karpathy·8 Mar

The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later. I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run: github.com/karpathy/autor… Alternatively, a PR has the benefit of exact commits: github.com/karpathy/autor… but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back. I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.

English

529

714

7.6K

1.1M

David Shi retweetledi

bayes@bayeslord·8 Mar

if you have any doubt that the rsi loop has kicked off, let this be the sign you've been waiting for. it's here.

English

197

30.2K

David Shi retweetledi

David Shi@promptrotator·6 Şub

agents have a dunbar's number of infinity

English

388

David Shi@promptrotator·8 Mar

the world is transitioning from an MMORPG to a MMORTS

English

119

David Shi@promptrotator·8 Mar

@sodofi_ @operator_io

QAM

sophia@sodofi_·2 Mar

who are the best teams building for agents? respond below if you want to be part of something new

English

229

290

46.1K

David Shi@promptrotator·4 Mar

@moo9000 @pashov @trailofbits @cyfrin @0xKaden @QuillAudits_AI @archethect has anyone tried these on evmbench?

English

371

Mikko Ohtamaa@moo9000·4 Mar

Claude Code skill files for smart contract auditing @pashov: github.com/pashov/skills @trailofbits: github.com/trailofbits/sk… @cyfrin: github.com/Cyfrin/solskill @0xkaden: github.com/kadenzipfel/sc… @QuillAudits_AI: github.com/quillai-networ… @archethect: github.com/Archethect/sc-… Did I miss any? 🧐

GIF

English

100

618

56K

David Shi@promptrotator·3 Mar

@onusoz which ACP is this?

English

Onur Solmaz@onusoz·3 Mar

It must be such a weird feeling for big labs when the service they are selling is being used to commoditize itself I am using codex in openclaw to develop openclaw, through ACP, Agent Client Protocol. ACP is the standardization layer that makes it extremely easy to swap one harness for another. The labs can't do anything about this, because we are wrapping the entire harness and basically provide a different UI for it While I build these features, I just speak in plain english, and most of the work is done by the model itself. It feels as if I am digging ditches and channels in dirt for AI to flow through Intelligence wants to be free. It doesn't care whether it is opus or codex, it just wants to be free

English

4.5K

David Shi@promptrotator·28 Şub

@corbtt what’s the “books on Amazon” equivalent in this marketplace?

English

Kyle Corbitt@corbtt·27 Şub

This is an excellent startup idea. TaskRabbit 2.0 built with agents in mind would go really hard right now. Agents make the matchmaking and negotiating process far more efficient, which expands the possibility space and dramatically improves the UX.

Jared Zoneraich@imjaredz

@corbtt We are one TaskRabbit style API away from a general purpose AI company

English

19K

Keşfet

@js_horne @safetnsr @_weidai @mercor_ai @jasonlaster11 @dnlklr @Siddhant_K_code @llamaonthebrink