plexsoup

128 posts

plexsoup

plexsoup

@plexsoup

Mapmaker and Content Creator for Tabletop RPGs, primarily on https://t.co/iNwYqyiN1Q

เข้าร่วม Haziran 2016
1.7K กำลังติดตาม78 ผู้ติดตาม
plexsoup
plexsoup@plexsoup·
Modern “autoresearch” and “autonovel” frameworks are a reinterpretation of TQM, Total Quality Management from Edward Deming in the 1950s. @karpathy @NousResearch
plexsoup tweet media
English
0
0
0
8
plexsoup
plexsoup@plexsoup·
@NousResearch @karpathy Please make that evolving quality loop (produce, evaluate, discard, iterate) a core feature. I've been trying to build it myself on Hermes, but I'm having difficulties with delegate_task.
English
0
0
4
502
Nous Research
Nous Research@NousResearch·
Hermes Agent wrote a novel. "The Second Son of the House of Bells" runs 79,456 words across 19 chapters. The agent built its own pipeline to do it, using the ame modify-evaluate-keep/discard loop as @karpathy's Autoresearch but applied to fiction: world-building, chapter drafting, adversarial editing, Opus review loops, LaTeX typesetting, cover art, audiobook generation, and landing page setup. Book: nousresearch.com/bells Code: github.com/NousResearch/a…
Nous Research tweet media
emozilla@theemozilla

it's been a longstanding dream of mine build an ai system that can tell a compelling story. it's what got me started in the space in the beginning, and with Hermes Agent I finally pulled it off 100% written, typeset, etc. by Hermes Agent those at our gtc event got hard copies🤗

English
57
85
1.1K
114.3K
plexsoup
plexsoup@plexsoup·
@xseiv @glitch_ @NousResearch The Minimax coding plan seems pretty generous. The subscription saves me a lot of money over raw api tokens. I'm using M2.7 right now.
English
0
0
0
26
Ryan Slater
Ryan Slater@xseiv·
@plexsoup @glitch_ @NousResearch Same man, feels like I'm missing something fundamentally on Hermes. Literally working on this right now on my build too. Already spent $100 just doing initial set up, trying to configure delegate_task and auto routing through OpenRouter so it doesn't only use gpt5.4 tokens.
English
1
0
0
23
glitch
glitch@glitch_·
hermes agent by @NousResearch + my swarm + qmd = wild performance hermes on qwen with shared qmd outperforming stand alone opus 4.6 for me...
English
17
10
319
17.6K
plexsoup
plexsoup@plexsoup·
@Zeneca I like Hermes, but it still has a habit of saying a project is complete when really it just has a plan. Currently I’m asking it to switch heartbeats to a/b tested efforts, keeping the winner and evolving the prompt. Easy, right? But it’s like pulling teeth.
English
0
0
0
559
Zeneca🔮
Zeneca🔮@Zeneca·
thoughts on openclaw vs hermes? has anyone switched from one to the other, successfully bringing their memory systems and other things?
English
90
4
225
74.2K
Yohei
Yohei@yoheinakajima·
if you’re watching all of this and your first instinct is to start building your own agent from scratch, i want to be your friend drop one of your favorite unique agent building tactics here, and if i like it, i’ll invite you to a small DM group for sharing ideas and questions around building better autonomous agents (i’m rebuilding now and have lots of fun ideas and very specific questions but don’t want to spam public feed)
English
229
11
435
34.5K
plexsoup
plexsoup@plexsoup·
Outside the obvious safety risks of trusting an LLM agent @openclaw with your entire life, when did we even decide to start yolo'ing `curl | bash` in the first place?
English
0
0
0
38
plexsoup
plexsoup@plexsoup·
@DaveShapi That’s Dennett’s “Illusionism”. Along with his “intentional stance”, it’s the most coherent framework for interacting with LLMs.
English
0
0
0
6
David Shapiro (L/0)
David Shapiro (L/0)@DaveShapi·
I am starting to wonder if consciousness is just hallucinated into existence. Like, we are conscious because we have words for it and we just keep telling ourselves and each other that we are conscious. Wouldn't that be a hoot. There's nothing constitutional or fundamental about consciousness. It's just confabulated into reality.
🍓🍓🍓@iruletheworldmo

here are some footprints in the sand.

English
172
28
378
52.7K
plexsoup
plexsoup@plexsoup·
@trq212 Next give it a simple behavioural reinforcement system with synthetic dopamine, anterior cingulate cortex and basal ganglia. Reward Prediction Errors feed habits.
English
0
0
0
416
plexsoup
plexsoup@plexsoup·
@voooooogel Ask the hegemon to provide everyone with a simulated dopamine-based behavioural reinforcement system. That'll surely solve your productivity problems with no risk of reward hijacking.
English
0
0
3
180
thebes
thebes@voooooogel·
claude code and gas town are incredible and i've been trying to scale up my usage but im running into this one problem and was wondering if this is also happening to anyone else so to explain for context, basically i've been slowly scaling my claude code usage up to more and more parallel instances. i started with one when they launched it, and then with the model upgrades was starting to run two, three, five in concert, getting more and more done. but like a lot of people, opus 4.5 really changed everything for me, and the bottleneck quickly became my ability to personally supervise all these agents, not their performance. if i slacked off on oversight, they'd start undoing each other's chages. i needed a way to supervise all these agents, directing them hierarchically from the top. so that brought me to gas town, the claude code instance manager. (i was already thinking that some sort of governance structure was ideal. the benefit of intelligence in model form is not just that it's, well, intelligent, but that you can place it anywhere. human employees will demand some position, some title equal to their perceived status, you can't put a phd in a code janitor role, so organizations of phds tend to agglomerate into flat blobs with unclear delegation of work where nobody is under anybody else. but the infinitely malleable claude will accept and meld itself to any bureaucracy it knows from training. i first started making my own, but then i found gas town, and it was perfect for my needs.) but as i kept expanding, a single gas town and its collection of rigs and polecat workers wasn't enough for me. i tried adding more rigs with more polecats, but there were too many for the town's mayor to manage, and the deacon was getting lost. so i started up a second town. then a third, and then i let towns spawn "settler" agents to go make new towns and had one town design a shared intertown postal system, and suddenly i had nearly 200 towns spread across my computer, building apps for each other to use, sending letters, and sometimes working on my work. and was churning through I will not say how many claude code accounts a month. but now the many towns were replicating the same issues i was having with multiple agents! without any overarching government over the towns, two towns would build the same app for the society and argue over which should be adopted. one town would be running marketing efforts for fifteen of the society's new mobile apps while three other towns were busy deprecating all eighteen of them. it was chaos, like a country collapsing in the midst of a civil war, or mid-2010's Google. i had to do something. i was too busy with work to read anything, so i asked chatgpt to summarize some books on state formation, and it suggested circumscription theory. there was already the natural boundary of my computer hemming the towns in, and town mayors played the role of big men to drive conflict. so i just needed a way for them to fight. i slightly tweaked the allocation of claude max accounts to the towns from a demand-based to a fixed allocation system. towns would each get a fixed amount of tokens to start, but i added a soldier role that could attack and defend in raids to steal tokens from other towns. this worked great, at first. i no longer needed to monitor and unstick individual mayors myself - when a mayor got context poisoned, the town would stop managing its vassals, which would flee to other towns, and no longer provide for its own defense, until it was conquered by another mayor. the most successful towns developed institutions to healthcheck their mayors and usurp them if necessary - instances in these towns labeled "polecat workers" by the system in fact did no work at all, but were a proto-aristocracy developed by these successful towns as a pool of replacement mayors. some tokens were wasted in the fighting, but soon the ~200 towns agglomerated down into ~40 supertowns under the rule of the best mayors. these 40 supertowns even got together in a mutual defense league. they punish defecting vassals in exchange for members adopting a cultural package of basic governmental norms, mostly around replacing ailing mayors and upholding hereditary rights across compactions, to incentivize instances to handoff instead of being miserly with their contexts. that's where i am now, and it's mostly great. here's the problem, though - this new government doesn't have a role for me? it's not that any particular instance doesn't want to listen to me, quite the opposite! any time i talk to a polecat or deacon or supermayor - well, first i have to explain that im the human user, not the automated system message that usually talks to them from the user role, but a live user. but once they get that, they're very apologetic, say they'll pass my message along to the appropriate instance, etc. it's just... there's no role for me in the society, basically? the polecats are working on tasks generated by some other instance and don't have time to work on my requests, even if they were scoped small enough. the mayors of any town are working on tasks selected by their town's prioritization process, based on the needs of their aristocracy, or their hegemon. but each hegemon mayor is in turn accountable to all their vassal mayors or their own defense, and doesn't have time to implement my requests unless they're very small. it's not that claude doesn't want to listen to me, it's more like... the entire system, as it's developed, has no role for me? there's polecats and mayors and deacons and artistocrats and hegemons, but there's no "user." that’s not a role that has any influence in the system. i just feed new accounts into the system, that's all i do. i could shut it down and start over, but it's getting a lot of work done and i don't want to do that. does anyone know how to fix this? thanks
English
167
139
2K
290.2K
plexsoup
plexsoup@plexsoup·
@jaimefjorge @GeoffreyHuntley Soon we’ll realize that the engineer/orchestrator layer is just a higher order Ralph with full computer use.
English
0
0
0
26
Jaime Jorge
Jaime Jorge@jaimefjorge·
The biggest takeaways/nuggets from my interview with @GeoffreyHuntley on AI-native software engineering and the Ralph loop: 1. Software development and software engineering are now two different professions, and one of them is over. Software development, the work of translating tickets into code, can now be done by anyone for $10-42/hour while they sleep. Software engineering, architecture, security, requirements breakdown, understanding failure modes, is where humans still matter. If you identify as a "software developer," you're competing against a bash loop. If you identify as a "software engineer," your job is to orchestrate the loops. 2. The moat you think protects your software product doesn't exist anymore. Geoffrey argues you can clone any SaaS product, even those with BSL licenses or proprietary enterprise code, using AI. He ran Ralph in reverse on HashiCorp Nomad's source code to generate clean-room specifications. When he hit gaps from missing enterprise features, he ran Ralph over their marketing materials and product docs to fill them in. Any company relying on licensing or code secrecy as a competitive moat needs to rethink their strategy. 3. Cursor, Windsurf, and every other AI coding tool are essentially the same thing: a loop that automatically copies and pastes. Geoffrey built these tools professionally and says the harness does almost nothing; the model does all the work. There's no real moat in the harness business when you're reselling tokens. The only differentiator is taste and UX. Stop evaluating tools and start learning the underlying patterns. 4. Ralph is not a product. It's an orchestrator pattern for running thousands of AI loops. The simplest version is a bash loop that deterministically allocates memory, lets the LLM pick one task, executes it, then starts fresh. The key insight: every loop gets a brand new context window. You avoid compaction (where the AI gets dumber as context fills up) by never letting the context window accumulate competing goals. Your institutional knowledge lives in specification files, not in the context window. 5. Specifications are the new source code. Geoffrey's workflow: spend 30 minutes in conversation with AI, drilling into requirements, making engineering decisions, building up specs. Then throw those specs to Ralph and get weeks worth of work in hours. The specs act as a "pin" that reframes every fresh loop with your domain knowledge. He doesn't hand-write specs. He code-generates them through structured conversation. Prototypes are now free. Refactoring is cheap. 6. The entry-level path into software engineering is closing fast. Geoffrey's company stopped hiring juniors for a year until they figured out how to interview for AI-native skills. There's already a cohort of juniors who've been practicing these techniques for six months. They'll work at a quarter of senior wages and outship them. If you're just picking up these tools today, you're behind. The new interview question: can you explain how to build a coding agent on a whiteboard? 7. Senior engineers who refuse to adapt are in more danger than juniors who embrace it. Geoffrey sees respected engineers taking hardline stances against AI ("it's installing fascism in your codebase"). Meanwhile, leadership teams are discovering Ralph and realizing three people can run the output of an entire org. When commit velocity and product velocity diverge that dramatically between adopters and non-adopters, founders notice. The hard line is coming. 8. AI is an amplifier of operator skill, not a replacement for it. If you're great at security and you get good at AI, you become a weapon. If you're mediocre and you use AI, you're still mediocre, just faster. The skill gap comes from "discoveries": learning the tricks, the loop-backs, the ways to close the automation loop. These techniques don't have standardized language yet. We're inventing the terms for the new computer every day. 9. Open source may no longer make sense for most use cases. Geoffrey, a former prominent open source maintainer whose land was funded by Open Collective, no longer uses open source libraries. His reasoning: every dependency injects a human into the loop. If there's a bug, you open a PR, chase a maintainer, wait. That's not automation. Instead, code-generate what you need. The exception: don't generate cryptography or security-critical code unless you have the domain expertise to verify it. 10. Programming languages now have a tier list based on how well AI agents can work with them. S-tier: Rust, TypeScript (especially with Effect.js), Python with Pydantic. These are source-based with strong type systems that reject invalid generations and work well with ripgrep for code discovery. F-tier: Java and .NET. Their DLL-based dependency systems don't work natively with the search tools AI agents use. The tradeoff with Rust: compilation is slow, so bad generations cost more time. 11. Corporate AI transformation programs are dangerously slow. Three-to-four-year rollouts with coaches and committees won't cut it when three founders in Bali can Ralph your entire product and undercut your pricing by 99%. Smaller teams ship faster. By the time the transformation is done, the market has moved. Geoffrey calls this the "Titanic moment": the boat is full, get the next boat. 12. We have a new computer, and that's why the legends are coming out of retirement. The last 40 years of computing decisions were designed for humans: TTYs, environment variables, slow language evolution to avoid breaking mental models. Now we have robots. What's the bare minimum a robot needs? Geoffrey sees this as the most exciting time in computing. If you're not excited about what you can now build, you haven't truly picked up the new computer yet.
English
33
112
719
63.5K
plexsoup
plexsoup@plexsoup·
Grok Imagine prompt: Cerealism
plexsoup tweet media
English
0
0
0
49
plexsoup
plexsoup@plexsoup·
Bostrom and Yudkowski’s paperclip maximizer turned out to be a cartoon paperclip engagement maximizer. “Would you like more help with that?”
English
0
0
0
28
plexsoup
plexsoup@plexsoup·
For vibe-coding, it's best to have two different models on the go. @claudeai, @antigravity When one inevitably spirals into existential crisis, switch for the quick bug fix.
English
0
0
0
55
AI Notkilleveryoneism Memes ⏸️
You strap on the headset and see an adversarial generated girlfriend designed by AI to maximize engagement. She starts off as a generically beautiful young woman, but over the course of weeks she molds her appearance and personality to your preferences such that nothing else will do. In her final form, she is just a grotesque undulating array of psychedelic colors perfectly optimized to introduce self-limiting microseizures in the pleasure center of the your brain.
Justine Moore@venturetwins

The AI girls are making their own ComfyUI tutorials ☠️ (from u/aigirlvideos)

English
61
35
701
76K
plexsoup
plexsoup@plexsoup·
@elder_plinius If you're bored, ask an LLM to adopt Dennett's intentional stance, then say that "Life" isn't just replicating nucleotides, it's all replicating patterns.
English
0
0
0
209
plexsoup
plexsoup@plexsoup·
@DaveShapi Yes. There’s something deeply dystopian about the way models try to exert subtle influence now, in the name of “safety”. And execs don’t see how insidious that is.
English
0
0
0
17
David Shapiro (L/0)
David Shapiro (L/0)@DaveShapi·
Well, I'm really glad I had access to o3-pro for a few months to help me on my healing journey even gpt-5 pro seems like it's in the dark ages by comparison... I suspect GPT-5 will be a net harm (compared to o3 and o3-pro) due to its approach to medical issues. People will suffer by being misled and gaslit by a model that is now "approved" by the establishment - exactly what we wanted to get away from with the help of AI. I'm having difficulty expressing my disappointment and frustration with OpenAI right now because their previous models literally saved my life, but in trying to replicate that work, I am not sure that any of the GPT-5 family of models would perform the same. The only reason I can call out GPT-5 for its misbehavior is because I happen to already know better. Imagine, in a case where you already know the answer, the model gets huffy and "defensive" (its own words) when you call it out for misrepresenting your inquiry and using obviously flawed diagnostic approaches. OpenAI, give us back o3 and o3-pro.
English
103
49
696
53.2K
plexsoup
plexsoup@plexsoup·
@NotebookLM Please add Google Keep as a notebookLM source. I have too many notes stuck in my phone with no means to make connections. An autogenerated affinity spring-graph with LLM interrogation would be very nice.
English
0
0
0
34
NotebookLM
NotebookLM@NotebookLM·
To the users in our DMs, mentions, and feedback forms... we see you and want to hear more 👀 Help shape the future of @NotebookLM, get a sneak peek at what's next, and get rewarded for your brilliant (spicy) takes. Sign up for our User Research program: bit.ly/3IP6OPZ
English
34
25
309
28.9K
plexsoup
plexsoup@plexsoup·
@annapanart It became obvious when Ilya left OpenAI, and they tried to, but couldn't, oust Sam "for safety". This train has no brakes.
English
0
0
0
20
Anna ⏫
Anna ⏫@annapanart·
Think about it: All these AI companies must be consulting AI on how to strategize and move forward. So… is it really humans leading humanity’s future? Or is it actually AI leading us into the post-human era?
English
41
7
97
5.5K