Krascsi

3K posts

Krascsi

Krascsi

@krascsi

Applied AI Systems Engineer back to private aim coaching due to other responsibilities

Katılım Haziran 2018
1.6K Takip Edilen3.2K Takipçiler
Sabitlenmiş Tweet
Krascsi
Krascsi@krascsi·
Aiming made simple. Results made real. Over the last 5 years I've coached 200+ people with a 0% failure rate. People ranging from beginners to top500, hobbyists to professionals, students to business owners. Today I am opening the doors to the complete ecosystem. Info below:
English
2
1
21
3.9K
Krascsi
Krascsi@krascsi·
Would pruning the dataset be too difficult to do due to the amount of data in the pretraining phase? Couldn't they just use already existing datasets? Curate them very carefully? Or go even harder on RL? For quite some time I was thinking the same that code quality will get worse before it gets better but I'm pretty sure especially with the improvements we've been seeing that it can be solved even if in just a patching/bandaid manner for now before a bigger breakthrough
English
2
0
1
118
Sebastian Aaltonen
Sebastian Aaltonen@SebAaltonen·
But there will be a future where AI writes better code than human, and next gen AI will use that code as training material, etc, etc. Recursively improving itself. But we will see a several year temporary dip in average code quality before that happens.
English
10
0
32
5.1K
Krascsi
Krascsi@krascsi·
@elliotarledge agent_max_depth, seem incredibly interesting, I wonder how it could really do, haven't tested it yet but I'm really hyped on it because of Confluence Labs, Symbolica, RLM paper. Also seen people implementing a python repl in custom forks, could be very interesting
English
1
0
1
69
Elliot Arledge
Elliot Arledge@elliotarledge·
codex-rs subagent architecture is clean 5 tools exposed to the LLM: spawn_agent, wait_agent, send_input, close_agent, resume_agent. Plus a batch mode (spawn_agents_on_csv) where each CSV row becomes a worker agent, up to 64 concurrent. Key design decisions: - Context forking: parent's rollout file is flushed to disk and child starts with full parent conversation history. Otherwise child starts fresh. - Config inheritance: child gets parent's runtime config (model, sandbox, approval policy, cwd) with optional overrides. Role-specific TOML configs layered on top. At max depth, collaboration tools are disabled to prevent infinite recursion. - Async completion notification: a detached tokio task subscribes to each child's watch channel. When a child finishes, a notification is injected into the parent's conversation -- no explicit polling needed. - Weak references in AgentControl to break reference cycles between ThreadManager, Agent, Session, Services. - Limits: agent_max_depth (spawn recursion), agent_max_threads (concurrent agents), per-worker timeouts, wait timeout clamped to 10s-3600s.
English
12
9
129
9.6K
Krascsi
Krascsi@krascsi·
I'm an avid Codex user and I can't work on 2 important things at the same time with it If I do I can't keep it all in my head, I autopilot and let it rip and it ends up with shit solutions that I have to revert and remake properly Method is 1 hard problem with pen and paper and docs and 1 slave job that's really hard to fuck up and is just a pain to type out, something that feels like a waste of time and resources I could be putting to harder problems that are much more design-oriented This way I don't make huge mistakes because I'm always tapped in and I can see if something is not going in the right direction, so even if it feels slower it feels much better because I'm still thinking about contracts, abstractions, invariants but I'm pretty sure it's much faster overall because I don't have to restart ever (and cleaning up slop is ROUGH so I'd rather not do it either) I very much so dislike the slot machine argument that people are throwing around because yes obviously you can use it like that but you can use it in smarter ways so it can do work for you that's a pain to do manually and is not that critical. It's pretty good at designing experiments with the right directions so you can test a lot of things in a short amount of time. With the right supervision you can even let it do critical work but it really just works as an abstraction over typing but it can give you directions you didn't think about if you ask it. Software design, logic (Software Foundations) are more important than ever if you're using these tools. Quality of thinking, precise specs and constraints, clear control and data flow design so you can eliminate potential errors even while designing not just when you're actually getting code out. So if you give it all the control don't be surprised if it converges to average even with all the post training RL Don't outsource all your thinking, keep it to yourself and use it like the tool it is
Manthan Gupta@manthanguptaa

LLMs have made code cheap. So now people are spinning up 10 agents working on 10 features in parallel. Sounds productive. But the tradeoff is obvious: the code quality is often spaghetti + over-engineered. LLMs behave like over-eager interns. They will do more than asked, add abstractions you didn’t need, and optimize for "completeness" over simplicity. Which means you end up babysitting anyway. For anything non-trivial, I have found you still need to spend 1–3 hours upfront: • defining scope • writing clear specs • thinking through system boundaries • setting constraints Otherwise, the system drifts. And even after that, you have to review the code. They still hallucinate patterns, introduce unnecessary layers, or miss edge cases, even with detailed instructions. A lot of people advocate "just let agents cook." In practice, you're often getting 60-70% unnecessary code that increases: • cognitive load • onboarding time • surface area for bugs • long-term maintenance cost For side projects, this is fine. But for real systems with shared codebases, multiple engineers, and production traffic, this compounds fast. We are already seeing: • unstable tools • memory leaks • constant crashes • frequent rewrites This isn't just "early days", it’s a direct result of speed > discipline. Spinning up 10 agents feels like productivity. But you are often just pulling forward the cost into refactoring hell. I would rather: build slower → keep systems simple → refactor less frequently Good engineering is still about what you choose not to build.

English
0
0
0
251
Krascsi
Krascsi@krascsi·
Actually been developing tools in a similar manner, cli-first, structured output when necessary. This way testing is much easier, parts are composable and wiring everything together if someone wanted a UI for some reason for stuff that's not that clearly agent first actually works incredibly well
English
0
0
0
38
Krascsi
Krascsi@krascsi·
@banteg It's been very fun to read about the progress you've been making on this 🫡
English
1
0
0
274
banteg
banteg@banteg·
introducing bn: agent-native bridge for binary ninja co-designed with codex to actually address its needs, it reaches sota performance for reverse engineering work. read more in the blog post banteg.xyz/posts/bn/
banteg tweet media
English
11
11
170
20.8K
DevinDTV
DevinDTV@DevinDtv·
@krascsi i'm aware of the logic behind it, i just think it's fake & gay nervousness and excitement aren't the same emotion, and no amount of telling myself i'm just excited will make me not nervous
English
1
0
1
74
Krascsi
Krascsi@krascsi·
One of the strongest reframes is from nervousness to excitement
English
1
0
4
512
Krascsi
Krascsi@krascsi·
Why? For me it worked, I believe in labeling things and that having effects on how we perceive those things. It's like before doing something difficult and new most people have a feeling of anxiety. If it's framed as I'm nervous which means worried and anxious that can easily snap into overthinking how many ways you can fail. If it's framed as I'm excited which means enthusiastic and eager that's just acknowledging the physiological response your body is giving you as something that's a non-threat. So if that doesn't pass a panic threshold you can actually use that arousal for better performance. Kinda could be related to flow. Is it the 2 things I suggested the reframes between you disagree with or you don't think framing/labeling things matter or it's not a strong one but a weak one among reframes
English
1
0
2
124
DevinDTV
DevinDTV@DevinDtv·
@krascsi that's the fakest shit ever doesn't work or even make sense
English
1
0
1
88
Krascsi
Krascsi@krascsi·
it's supposed to be able to use a REPL like python to pass objects and variables as context so an agent can call a sub-agent and the sub-agent can get some results and pass that to another sub-agent and it can give results back a level and that sub-agent can keep computing stuff and at the end give just the relevant stuff back to the main agent so it can theoretically get relevant information from stuff 10x of its context window without having to ingest the whole thing so it's the fact that it doesn't use JSON and stuff but can use stuff with types etc. but still has some limitations @agenticasdk from @symbolica is doing this, that's the qrt up here at least that's how I understood it, could be wrong
English
0
0
0
24
Taelin
Taelin@VictorTaelin·
@Equilib09683495 ok let me see if I get it, RLM is just a name for an "agent that calls sub-agents ad infinitum" - right? but then Codex and Claude ALREADY does that (right?) so how that is an improvement over what I'm already using? is this just a "see, THIS is why it is working well"?
English
3
0
1
242
Taelin
Taelin@VictorTaelin·
Ok, my final GPT-5.3 Feedback: - It is the best model for compiler work - It writes code carefully and generates bug-free code - It is capable of executing incredibly hard prompts - Definitely the smartest model available IMO Problems: 1. It is NOT capable of grasping intent. It will just take your prompt at face value, no matter how obvious it is. In many cases. It is EXTREMELY frustrating to work with because of that. Sometimes it finds an interpretation to my literal words that I couldn't even anticipate. Working with GPT-5.3 is a test of patience and a good part of the job is making sure I anticipate all possible dumb ways it could interpret my prompt, and write exact words to drift it away from that potencial interpretation. And then it still finds a way. 2. It is a merciless complexity monster. When it comes to writing code, it has no shame. It is careless. It will just add, add, add, and never remove or cleanup. Even worse, it will often add nearly identical functions, instead of just using or adapting what exists. That goes against its own interests, because, past certain threshold, it will start under-performing (like all models). Often, after I ask for a feature, I'll just write a follow up prompt like "your code works but is way longer than needed, your goal now is to simplify it as much as possible" or variations of that. 3. It still forgets everything the day after. Not much to say about this, obviously a fundamental issue with LLMs that is *not* satisfactorily solved with memory or agents. And that's it. I strongly suggest OpenAI to take these 3 aspects seriously and explicitly train for them. Regarding 1, Opus does that just fine, so, I'm sure there's a way. Regarding 2, it shouldn't be hard, but it has to be done carefully, because if you just try to minimize token count, the model will tend to *minify* the code (use short variable names, make code-golf like uglifications). That is NOT what you want. You want to train it to reduce code size by: A. Removing redundancies. If a functionality is already implemented, it should FIND IT and USE IT. Sometimes this will require some modifications, but that's always better than writing the same logic twice. B. Abstracting the common pattern out. Often there will be 2 long functions, F() and G(), that can be merged into a parametrized function FG(), and then, F() and G() become specialized instances of FG(). This is universally desirable and teaching a model to do it will wield amazing results in practical productivity. C. Using simpler logic whenever possible. Sometimes there is just a simpler way to implement an algorithm or procedure. You should teach the model to favor that. Regarding 3, until there is a major breakthrough that solves continual learning, I think OpenAI should work in a product that allows us to at least mitigate the issue. Some people claim to have luck with nightly LoRA's. Being able to do that with codex models on my domain would be amazing.
English
151
68
1.5K
157.5K
yA A1cr 🏳️‍⚧️
yA A1cr 🏳️‍⚧️@A1cr_Official·
@bardozVAL Whats crazy is linux doesnt even offer that mich more optimization, im on cachys which is supposedly the best optimized for gaming, and most games got maybe a 5%boost to the 1% lows and thats it. Kovaaks is an outlier but it really doesnt matter
English
1
0
6
629
bardOZ (Giovanni Laerte Frongia)
treating windows vs linux like a console war is absurd I keep getting comments on my optimization video like "lol could have just installed linux, you wasted 1 month" literally 70%+ of the video is about OS-agnostic changes, wtf? since when are linux users a cult?
English
7
2
109
11.3K
Krascsi
Krascsi@krascsi·
End goal would be to let them monitor markets and spin up a brand real quick with the whole stack, milk it until it's there and shut itself down once it's gone, completely by itself. Could go crazy for stuff that has short lifecycles and first movers with the right infrastructure make the money there in there rn I assume
English
0
0
0
149
Krascsi
Krascsi@krascsi·
With RLMs new solutions become viable. The way I'd envision it is if multiple people and companies try to solve it and they're actually good it's going to end up like the HFT firms where not just quality but speed of code execution and decisions would matter orders of magnitude more than right now. First challenge is getting them to do it right and the second is getting them to do it at reasonable speeds (with persisting context ofc)
English
1
0
1
359
Stefan Georgi
Stefan Georgi@StefanGeorgi·
I am 100% certain that agentic full stack marketing will be solved in 2026. Honestly it's mostly solved already. I put out a post on how I'm working on my own solution, and I've since talked to numerous friends who are also working on their own solutions. While somewhat piecemeal, the solutions these folks are coming up with are impressive. The solution I'm working on is also, I think, impressive. But the big point is, DTC marketing will be pretty much fully automated this year - to the level of being push-button easy. So when I say full stack marketing what am I talking about? To put it really simply, you will be able to give an interface a very minimal amount of information (product name, maybe a short description) and it can: write, design, and publish sales funnels, create branded assets, create compelling ads (video and image), and deploy those ads into Meta and other media buying platforms where it will scale winners, cut losers, and continuously test. It'll be more-or-less recursive too - where there's a constant feedback loop and the interface/platform you're using gets feedback and creates more winners. On top of that, you can assume there will be added in layers for business intelligence that pull from your CRM, tracking platforms, analytics platforms, slack messages, customer service recordings, etc. This layer will guide decision making, identify opportunities, help you manage cash flow better, etc. It's going to be a wild world. Most of what a CMO does will simply be serving as an operator of the tool. And if I'm right, then where do "marketers" have a place in the future? A few thoughts: 1. Authentic and branded content. Founder-led brands that connect with audiences are a differentiator - especially in a world of AI "Slop". 2. High Ticket Sales. Another opportunity for differentiation or an edge will be having a high ticket phone sales team that offers larger packages to customers. This could be more products, but also experiences, coaching, consulting, etc. Why do I say that? This part is somewhat harder to automate with AI (though also solvable). But most DTC brands don't think this way - and it's currently an opportunity until everyone catches on. 3. Judgement and Discernment. This edge maybe lasts for 6-18 months after full-stack marketing is "solved." If you have a high degree of experience, your judgement on what to do with a scaling offer/funnel/brand will still be an edge against someone without that experience. Most humans are irrational and prone to poor judgement. If you are more rational and have better judgment you will continue to beat your competition. 4. Your network and relationships. If you're selling physical products or services, there's only so much negotiating AI can do on your behalf. Ultimately, if you have good relationships with vendors and a good history, you can get better pricing or better payment terms. This, in turn, gives you more of an edge. 5. Transforming DTC Brands into more Experiential ones. This encompasses elements of the other points. But I really believe the biggest DTC Brands of the future will incorporate experiences into their "offer stack." Think of events, conferences, parties, trips, etc. Ultimately taking the idea of "tribe building" to a radically different level. Additionally, it's worth keeping in mind that when full stack marketing gets solved, the entrepreneurs with capital to deploy will be in a very envious position. Imagine DTC brands as hyper-scalers. Extremely rapid iteration and feedback loops means brands can scale much faster than before, and the biggest limiters are inventory and available cash to reinvest into scaling. If you need to be profitable on Day 0 and don't have a ton of money to deploy, you'll scale much slower compared to those who can go negative and spend millions out the gate. But the problem, for you, is that those who have lots of capital to deploy will scale so fast and penetrate the market so fully, that it'll be difficult for the bootstrappers to keep up. And finally, it'll also be interesting to see how many of these platform-level "solutions" there end up being. Will we live in a world where there are multiple plug-and-play agentic marketing platforms? Will one or two be so much better than the others, and get such a critical mass of customers (and data) that they improve exponentially and leave others behind? I could see both worlds, though ultimately there are probably a few big winners and most other solutions are subsumed by those winners, or wither away. Anyways, just what's on my mind and where I think the future is headed.
English
31
15
274
25.5K
Krascsi
Krascsi@krascsi·
Most of it is not research or engineering but is glueing low quality stuff together and calling it "internal software". Most people have no clue about optimization especially when the amount of data you have to process is not just a few lines of text or a few times of data (audio is quite a bit even at 8khz, let alone fucking video man which I'm working on rn). 20-80 still applies, 80% of what you can see is bullshit. Finding cool people who actually do cool shit is a skill of its own, the 20% who are actually cooking big time.
English
0
0
0
263
David
David@yourealazyfvck·
After founding a technology company and being active in the "tech twitter" space, 99.99% of these people are larping grifters... way worse than anyone I've ever seen in the info & internet marketing space. At least guys in info are actually making money. Everyone continually is hyping up the "new tool" the new "AI agent" the new "bot" These tools and bots are absolutely useless and provide no value. The only companies that are making money are the tech giants that are perpetuating the psyop. Seems like it's really easy to psyop founder twitter & tech twitter in order to believe a narrative. Every other day there's a new model and new tool and everyone believes that it's the future without actually putting these tools into practice and production in real world application and scenarios. The problem is that everyone has a fake business and fake software on tech twitter so they're "building things for internal use" but never actually use it to generate themselves or their clients or customers revenue. For example, every single integration and application we build inside of our software platform for a customer, think edge-specific use cases, we allow all of our other customers to use it because there is a high probability that customers in the same space will run into scenarios where those tools and edge cases will be used. For example, we have a client currently that operates in the UK, Canada, and Nigeria. Which means we had to build a telephony based regulatory compliance system to make sure that documentation is compliant and processed through our platform so that they're compliant with local/specific telecom laws. Now, any user in a country where they require local presence, think Germany/Austria/Switzerland/Singapore/Hong Kong Can log into Synthesys, upload some quick documents in under a minute, and then put a phone number on their account which they can give to their AI agents in German or Chinese to make phone calls from anywhere in the world in over 48 different countries that we have native telephony capability and service for. That also means that Australian and UK customers can now simply log on, upload documents, and verify their information, and start using AI to make and take phone calls for their business within a few minutes. Oh and we're the only AI Voice Agent & Phone agent company to have full custom telephony integration for phone numbers and businesses in the UAE. We are the only company and AI voice platform in the world to do this and have this capability currently. We're also becoming the most integrative platform in the world and have plans of connecting over 95+ different native apps and softwares directly to the system so that you can view all of your aggregated information in real time. These all are real business use cases that we developed internally for ourselves and for our early stage customers in the past year, and now they are features that everyone who is a user can access. This is how you build real software with real business use cases and applications, not "Mah clawdbot to check the weather and build tools no one is going to use."
English
4
1
65
4.3K
Krascsi
Krascsi@krascsi·
@a1zhang @deliprao Is @symbolica working based on the same principles basically with @agenticasdk ? If so, how do you feel, what do you think about it hitting like 84% on arc agi 2 with sub 7 USD per task? Gonna spend this weekend getting deeper into it so if the question is redundant I'm sorry
English
0
0
0
87
alex zhang
alex zhang@a1zhang·
Providing some responses! I want to preface this all by saying the comparison of CC to RLM is a little fuzzy and not the correct framing to me, because a lot of CC is a highly post-trained, task-specific scaffold that shares a lot of similarities to the defn of an RLM. In fact, there are small tweaks that can be made to CC for it to fit the definition of an RLM. To illustrate this point, there are a few existing plugins for integrating RLMs into CC / OC. 1. Agreed, except there are some notable limitations of CC-style sub-agents. It relies on the root model being entirely correct about its order / choice of sub-calls, which, in many cases can be described a lot easier in code. The search example (and many other scenarios where you would want LM calls in a structured way) is a classic example. This is why LMs are ok at chess, Pokemon, etc. despite us having much thinner systems that can play these games at am much higher level. Give them more robust guarantees about their choice of LM calls, and I am 99% sure these tasks are solved more reliably. 2. I think I wasn't clear about what I meant. The task you're describing is also somewhat "code-like". What I'm proposing is that for tasks wildly different from general user workflows and coding, RLMs are applicable. For example, "solve protein folding. given a sequence of aa's, I want the correct 3D structure; provide me an algo that outperforms af3." They are another form of "reasoning model", and you should begin to think of them as a new type of "o1/o3" model if RLMs are trained properly. "Reasoning" programmatically + in tokens is the goal. And I agree CC can be used for non-coding tasks, but as it's currently setup, it generally works best for coding-type, structured workflows. 3. This is what the paper does. RLMs themselves are a super generic approach, and comparing to CC doesn't really make sense because 1. Opus 4.5/4.6 were heavily post-trained to work in CC, and 2. the RLM implementation is extremely simple, and a lot of what CC is overlaps with what RLMs are. We compare to generic alternative approaches (e.g. give an agent a code REPL and CC-like sub-agents, look at a summarization agent, ReAct, etc.) and show that on a variety of tasks, each agent has some kind of clear limitation. 4. OOLONG (dense semantic task) challenges techniques involving compaction. BC+, CodeQA challenge techniques that do not offload context off the model. In the limit, a lot of game-like settings I described are a somewhat clear and toy-like example of even CC having trouble (e.g. try it out on Pokemon, try it on Zork, etc.) but where I am somewhat confident RLMs will do well. 5. This is also related to 1. In theory this is true, but in practice this is not what's really happening on the whole. CC will not pipe its context into a file history and programmatically examine it for later, it will generally compact it and move on. It's not just prompts, it's generally the idea that the conversation history of the root LM, sub-agents, between sub-agents and root LM, etc. can all be saved. Now outside of these responses, I will mention three more things. The RLM paper stand-alone makes the argument that this very general, abstract LM system (analogous to something like ReAct) has a lot of advantages over other popular, general strategies. In this sense, comparing to CC, Codex, etc. doesn't really make sense because it's meant to be both task-agnostic and model-agnostic. The RLM *idea* is a bet on the future of LM systems. You're free to disagree, and perhaps your point is that the CC approach scales well as *the* future LM system that we should build around and scale. This is a completely fair take (I strongly disagree ofc), and I generally would point to "symbolic recursion" and the idea that relying on the LM to do sub-calls as tools is going to cause a large amount of technical debt in the future in terms of reliable LM capabilities. This is something I am actively working on, and perhaps the only real way to prove it works is to train / build out an RLM on a wide range of tasks :) Lastly, it makes perfect sense to be skeptical until real results are presented (tbh if I were in your position I would think the same). After all, there's no magic bullet, and a lot of what I'm proposing is a bet on how to train models in the future -- I am currently working on this bet, and perhaps I will have some cool results to share when that happens!
English
3
2
21
872
Delip Rao e/σ
Delip Rao e/σ@deliprao·
Since there's a lot of RLM fandom going around, I hope no one gets mad at me for asking the obvious. I read the paper, I read the blog post, and I worked through some examples. I am still scratching my head about: 1) what's new here than what other agentic harnesses (CC mainly) are doing, 2) what's one straightforward example where RLM wins that all models and agentic harnesses fail? For example, the "quickstart" example feels like an RLM counterexample.
Delip Rao e/σ tweet media
English
25
8
245
22.7K
Krascsi
Krascsi@krascsi·
Why is it that it's easier to set up video/audio for meetings on Fedora at this point than on Windows
English
0
0
1
503
Krascsi
Krascsi@krascsi·
@ludwigABAP @theodorvaryag Shooter mechanics add a different kind of depth and increase in complexity which is the reason even more seasoned FPS players who have MOBA backgrounds have difficulties learning it
English
0
0
1
109
ludwig
ludwig@ludwigABAP·
i genuinely think you might not have a sense of 1% of the skill ceiling in deadlock if you think every hero feels the same, the shooting doesnt add anything, etc... certainly, macro and game economy are much more fleshed out in dota2, but, purely mechanically, there is a LOT in deadlock to impact the game around you that could not exist in dota 2 (eg any of the tens of movement tech you should do to move around the map faster or engage/disengage in fights)
English
5
0
23
2.3K
Chris Allen
Chris Allen@theodorvaryag·
playing deadlock just makes me want to play dota 2 the fps mechanics don't add anything here and they aren't integrated in a way that's interesting every hero feels the same they failed to capture the asymmetry and variety of the hero shooters like overwatch
English
4
0
22
2.9K