
Arseniy Zarechnev
1.5K posts


@8teAPi It's magical when I got my first one. Spent over an hour crafting the prompt, sent opus for overnight research (small transformer training). Was still running 10 hours later, with research milestones documented in a journal, code snapshot every step, real progress made. Insane.
English

@GregorySchier They are silently optimizing and quantizing. Last week Opus was brilliant, this week all I hear from it is that I'm absolutely right
English

@bcherny please give us back smart opus. I've been absolutely right for a week now and its driving me nuts.
English


@camsoft2000 funny how I get completely the opposite experience, been maxing out my claude sub and underutilizing codex. smooth sailing with opus, but codex (5.3/5.4) overengineers and misses the mark
English

My Codex rate limit ran out last week and I’ve got another 24 hours to wait so been using Opus 4.6. Goodness me it really wants to handoff its work even when it’s not fully proven. Keep telling me it’s working, pipeline proven even though there were issues in testing in some cases. I kept telling it that it wasn’t always working and it kept telling me that one time it worked proves it’s all good and just trust me bro.
English

you have to live through your AI psychosis and emerge on the other side. i was in a similar state for months, finally said fuck it and am now happily creating stuff i'd never bothered/was able to do before
Mo@atmoio
I was a 10x engineer. Now I'm useless.
English

@phuctm97 Recreate it from scratch.
1. opus: research this messy code, write a paper on details of implementation, discard this, keep that
2. opus: take the spec and implement
worked for me to refactor 20k+ loc messy research project into 2k lines prod-ready
English

AI is not that good (yet) at refactoring a messy codebase written by itself.
I was trying to refactor a pretty small codebase (4K+ lines) written entirely by AI, because it started failing to add new features and instead adding too many bugs.
I thought it should be easy-peasy, but even with Opus 4.6 (high effort), it keeps failing to refactor without breaking at least 50% of the features. 😅
We're getting there tho, just a reminder that we're still early!
English

@TomBukic For me carry-mix helped a lot (sometimes .3, sometimes .8), and when I tried to abandon digit curriculum (1-3, then 1-6, then 1-10) models were stuck at ~0.2 random tok_acc for 50k+ steps. Somehow that helps
English

For sure! I think below 50 is doable, and was doing param searches down to 40 (not good, at this moment).
Yesterday I was systematizing my training pipeline and pushing a few sweeps below 60 that were infinitely slow. You made me to sweep through tying params in more unobvious combinations; and the next thing is optimizing the training pipeline. For me curriculum wasn't great, but didn't cycle back to it after getting it done. I'm confident it will allow better. I bet it will let those <55 converge as well.
English

🏆️
Dimitris Papailiopoulos@DimitrisPapail
AdderBoard: 40 submissions in. Smallest transformer that adds two 10-digit numbers at 99%+ accuracy: 🏆 Hand-coded: 10 params (lokimorty @memphismillano) 🏆 Trained: 62 params (@TomBukic ) Started at 6K → 10. A 600x compression .. Let's see if we can squeeze even more!
ART

@TomBukic With 57p its so small we were able to enumerate all remaining changes, none of them worked, so claude is now running huge seed sweeps with instrumented training, trying to figure out why the model needs those params that converge to near zero
English

@TomBukic Thanks! Will you try to push your architecture more? We found good insight via "structural analysis" - Claude writes a bunch of python to analyze trained weights and find similarities, regularities or potential zeros/reductions.
English

@martinlasek I think that would change when someone makes a chip implementing a huge foundational model in silicon. It would outperform generic GPUs 100x, provide real value and also a good reason to sell you a new device every year
English

@TomBukic Thanks, will try! It's not even so much about info in there, but the very unique seed/personality/idea mix that develops within one ctx trhough chat and is lost between compactions or summaries
English

Tips and tricks: You have the full conversation log somewhere in ~/.claude . You can just ask CC to obtain it, and than ask it to iteratively drop the noise until it reaches a managable file to be directly provided in context. Even RLM access is very nice (it searching the file), especially if you delegate search to subagents and just get the clean stuff in context.
English

@TomBukic We tried everything, all my stupid ideas, context was at 1% to compaction but primed, I asked Claude for his last "stupidly brilliant idea", he went into compaction and came out with a 57p solution what a chad. Your move now!
English





