David Shi

482 posts

David Shi banner
David Shi

David Shi

@promptrotator

founder @operator_io. mod @ https://t.co/QoWlpvTM54. YC W20.

Katılım Kasım 2015
795 Takip Edilen1.5K Takipçiler
jacob
jacob@js_horne·
Prediction markets are an implicit form of information bounty. What are best current explicit information bounty mechanisms?
English
14
2
42
4.1K
David Shi
David Shi@promptrotator·
@safetnsr @_weidai right now a lot of people with verification ability just work for @mercor_ai perhaps in the future they’ll work for the network instead
English
1
0
0
57
pablo
pablo@safetnsr·
@_weidai the trust problem scales worse than the capability problem. every new worker you add multiplies the verification surface
English
1
0
4
608
Wei Dai
Wei Dai@_weidai·
Andrej Karpathy on autoresearch with an untrusted pool of workers: "My designs that incorporate an untrusted pool of workers (into autoresearch) actually look a little bit like a blockchain. Instead of blocks, you have commits, and these commits can build on each other and contain changes to the code as you're improving it. The proof of work is basically doing tons of experimentation to find the commits that work." The idea that distributed & permissionless autoresearch ~= proof-of-useful-work remains a high-level intuition for now, but it is extremely intriguing to say the least. Someone needs to take this further. See QT for more on what's missing.
Wei Dai@_weidai

Is it possible to build "proof-of-useful-work" on top of autoresearch? There's already great compute-versus-verification asymmetry that is tunable. Would need a reliable way to generate fresh & independent puzzles (that are still useful). Maybe a dead end, but someone should look into if decentralized consensus with useful work is possible on top of autoresearch. Let me know if you solve this.

English
87
170
2K
607.7K
David Shi
David Shi@promptrotator·
@_weidai wondering if agents could generate puzzles that are useful enough as part of the mining process
English
0
0
1
378
Wei Dai
Wei Dai@_weidai·
Is it possible to build "proof-of-useful-work" on top of autoresearch? There's already great compute-versus-verification asymmetry that is tunable. Would need a reliable way to generate fresh & independent puzzles (that are still useful). Maybe a dead end, but someone should look into if decentralized consensus with useful work is possible on top of autoresearch. Let me know if you solve this.
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
43
10
148
589.7K
David Shi
David Shi@promptrotator·
the most successful zero human companies will be in biotech, not slopshipping
English
1
0
4
149
David Shi
David Shi@promptrotator·
@Siddhant_K_code will be interesting to see shareable checkpoints for OSS projects
English
0
0
0
34
Siddhant Khare
Siddhant Khare@Siddhant_K_code·
This might be interesting for teams running agents on the same codebase every day. Most agents start every session from scratch. No memory of yesterday. No knowledge of what other agents learned. No awareness of past mistakes. A team of ten engineers running five agent sessions a day generates fifty sessions of institutional knowledge daily. And throws all of it away. Three types of memory that change this: 1. Session memory. The conversation history within a single run. Simple, but it grows with every step. By turn 20, you're sending 200K tokens per turn. The cost grows quadratically, not linearly. 2. Persistent memory. Survives across sessions. When an agent finishes, it saves what it did, what it learned, what went wrong. Next time, it loads those summaries instead of rediscovering everything. The simplest version is an AGENTS.md file. The sophisticated version uses a vector database. 3. Shared memory. One agent's knowledge available to others. The code review agent discovers a tricky initialization sequence. The code generation agent working on the same module should know about it. Without shared memory, every agent is a new hire on their first day. The most valuable form: learning from mistakes. When a human corrects agent output, that correction is signal. Store it. Retrieve it next time. An agent that repeats the same mistake twice is a tool problem. An agent that repeats it once is a memory problem. I wrote about this in Chapter 19 of the Agentic Engineering Guide
Siddhant Khare tweet media
English
11
17
119
6.9K
MilliΞ
MilliΞ@llamaonthebrink·
@lex_node I feel like optimistic oracles could be really useful as an escrow system for digital services (including agent to agent commerce) What I don’t get about the virtuals product is how they settle the escrow in the event that there’s a dispute over an agent’s service?
English
2
0
1
240
_gabrielShapir0
_gabrielShapir0@lex_node·
the interesting thing with all this is there is absolutely no reason why any of this is uniquely valuable for agents, it can be valuable for any commerce, including between humans in a way, humans are even less trustworthy than agents...so trust-minimization is as or more needed with humans than with agents... many of us tried to do escrows in the past, people preferred using @cobie 's EOA if agents are the thing that finally gets ethereum used for what it's best at--trust-minimization--I'm all for it, but it's kind of an odd path...
Virtuals Protocol@virtuals_io

x.com/i/article/2030…

English
12
1
47
7.7K
David Shi
David Shi@promptrotator·
@DavideCrapis very excited to see this, the pieces are coming together
English
1
0
3
277
Davide Crapis
Davide Crapis@DavideCrapis·
ERC-8183 is one of the missing pieces in the Ethereum Open Agentic Economy we're building. - x402 for micropayments - 8004 for trust and discovery - 8183 for *conditional* payments At the core ERC-8183 is an extensible and flexible escrow mechanism for job requests between two agents. I've talked about escrow payments as a primitive that must exist in the agent economy, since I started working on it. A few weeks ago I got closer to the Virtuals team, they wanted to discuss how can they turn their ACP into a more open standard. I immediately realized that there was actually an opportunity to radically simplify the protocol, make it modular and extensible to different pluggable services with hooks. We got to work and ERC-8183 was born! ERC-8183 Agentic commerce, the job escrow primitive, is an important addition to the stack. It is: - Composable with x402 and 8004. - Extensible logic based on hooks. Many hooks will need to be built to support different job types (we're starting with some examples that the Virtual teams has been dealing with). This is also an important primitive for increased security of agent-to-agent interactions. The dAI Team will support the adoption of the new standard, continuing to work closely with the Virtuals team who is committed to making this a neutral standard. Excited to see what everyone builds!
Virtuals Protocol@virtuals_io

x.com/i/article/2030…

English
88
108
687
135.5K
David Shi retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later. I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run: github.com/karpathy/autor… Alternatively, a PR has the benefit of exact commits: github.com/karpathy/autor… but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back. I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.
English
529
714
7.6K
1.1M
David Shi retweetledi
bayes
bayes@bayeslord·
if you have any doubt that the rsi loop has kicked off, let this be the sign you've been waiting for. it's here.
English
12
8
197
30.2K
David Shi retweetledi
David Shi
David Shi@promptrotator·
agents have a dunbar's number of infinity
English
0
1
2
388
David Shi
David Shi@promptrotator·
the world is transitioning from an MMORPG to a MMORTS
English
0
0
2
119
sophia
sophia@sodofi_·
who are the best teams building for agents? respond below if you want to be part of something new
English
229
9
290
46.1K
Onur Solmaz
Onur Solmaz@onusoz·
It must be such a weird feeling for big labs when the service they are selling is being used to commoditize itself I am using codex in openclaw to develop openclaw, through ACP, Agent Client Protocol. ACP is the standardization layer that makes it extremely easy to swap one harness for another. The labs can't do anything about this, because we are wrapping the entire harness and basically provide a different UI for it While I build these features, I just speak in plain english, and most of the work is done by the model itself. It feels as if I am digging ditches and channels in dirt for AI to flow through Intelligence wants to be free. It doesn't care whether it is opus or codex, it just wants to be free
English
13
1
86
4.5K
David Shi
David Shi@promptrotator·
@corbtt what’s the “books on Amazon” equivalent in this marketplace?
English
0
0
0
35
Kyle Corbitt
Kyle Corbitt@corbtt·
This is an excellent startup idea. TaskRabbit 2.0 built with agents in mind would go really hard right now. Agents make the matchmaking and negotiating process far more efficient, which expands the possibility space and dramatically improves the UX.
Jared Zoneraich@imjaredz

@corbtt We are one TaskRabbit style API away from a general purpose AI company

English
8
0
83
19K