🐚xtotem
1.4K posts

🐚xtotem
@0xtotem
9 years of ai ml deep-learning formerly @immunefi, @code4rena dms are open



After careful consideration, we’ve made the decision to wind down @code4rena. This community has meant a great deal to everyone who has been part of building it, and sharing this news is not easy.








The benchmarks that they're putting out are unreliable. There are recipes out there to tune a model to hill climb any public benchmarks. A lot of models will regurgitate solutions if you just ask anything remotely close to one of the problems in the benchmark. The best way to evaluate is to have internal problem sets that you know deeply about and are not public, and see how different models perform on that. The #1 model on the coding benchmark is not the best coding model for quite a while. Good luck figuring it out!







You can’t have a lasting competitive advantage if LLMs are in your critical cognitive path. Whatever you do will be at best limited to the same capabilities and constraints as what everyone else is doing. You’ll find the same bugs as everyone else, build the nth clone of the same app, trade using the same strategy. Using an llm while keeping it out of the critical path isn’t easy. If your bug leads are LLM generated — you will be constrained by the bugs that can be found by LLMs. No amount of prompting or composition will give you more than a small margin over your competitors that tends to collapse to zero. If all your code is LLM generated you are constraining your product to the universe of code LLMs can write — which is the same code your competitors can easily write. Even adding human input and review doesn’t necessarily change the picture. No amount of context, human generated hints, or iteration, can force a model to produce something that diverges too much from it’s training data. The cost of doing something through an LLM scales exponentially as you move further away from sample. That’s not something the next model will fix. **It’s a mathematical necessity** Past a certain level of novelty making the model do the work requires more human mental work than just doing it yourself. But you take a long time to notice it, because the LLMs nudges you back to doing what it’s good at. For coding it just keeps writing something that isn’t exactly what you asked and you keep telling yourself it’s close enough. For finding bugs the cost shows up as not finding the bugs as the model keeps looking for typical vulnerabilities. For s bug hunter it means spending compute without revenue. For a protocol it can mean getting rekt. You have a brain evolved over millions of years with priors that match the real world in a way a statistical model of text cant. Your brain can legitimately extrapolate beyond what others thought in the past in a way that is impossible **even for an LLM with arbitrarily large compute power** Use it.








