Amart (LOOP)⚡️

60.1K posts

Amart (LOOP)⚡️ banner
Amart (LOOP)⚡️

Amart (LOOP)⚡️

@LoopOnChain

Founder and Managing Partner Edge AI | prev @Rivian | I show you how to transform your life with AI @umich

Los Angeles, CA Katılım Eylül 2021
1.6K Takip Edilen26.1K Takipçiler
Ben Warren
Ben Warren@bwarrn·
I'm so excited to officially share what we've been cooking up at Mesa: the most powerful filesystem ever built for AI agents. The dirty secret of every "production" AI agent today: the filesystem is held together with duct tape. Teams are stitching together S3, GitHub, sandbox-local disks, and homegrown diff logic to give their agents something resembling persistent, versioned storage. None of it works. S3 isn't designed for parallel agents - concurrent agent writes silently overwrite each other. GitHub has the semantics but rate-limits you into the ground at agent scale and doesn't give you filesystem ergonomics. Sandbox disks vanish the moment the container dies. And your agents don't want to git clone and git push anyway. 𝗧𝗵𝗲𝘆 𝘄𝗮𝗻𝘁 𝘁𝗼 𝗿𝗲𝗮𝗱 𝗮𝗻𝗱 𝘄𝗿𝗶𝘁𝗲 𝗳𝗶𝗹𝗲𝘀. 𝗟𝗶𝗸𝗲 𝗲𝘃𝗲𝗿𝘆 𝗽𝗿𝗼𝗴𝗿𝗮𝗺 𝗲𝘃𝗲𝗿 𝘄𝗿𝗶𝘁𝘁𝗲𝗻. So we built the missing layer. Mesa is a durable, POSIX-compatible filesystem with version control built in. Branches, diffs, history, rollback, access control — every primitive a codebase has, for any file type, at agent scale. You mount it. Your agent uses it like a normal filesystem. We handle the rest. Private beta is live. Link in comments.
English
35
26
317
44.4K
Alex Lieberman
Alex Lieberman@businessbarista·
I may be crazy, but I built a 20-level excel game to find a Finance savant to join our company. The game is called "Bug Hunt," and any excel junkie interested in becoming the leader of our Finance function at @tenex_labs can play. If you complete all 20 levels, you are accelerated to a final round interview with my cofounder & me. Here's how it works: 1) Open the model. It's a live workbook in your browser with the "finished" financials of a fictitious SaaS company. 2) Mark every bug. Click any cell, write one line of reasoning. Submit when you're sure. There are 20 total. 3) Climb the tiers. Each correct catch unlocks the next. The final three are veteran CFO-level. 4) Hit level 20 & auto-move to a final round interview. Play the game: web-production-42101.up.railway.app P.S. you can still apply to be our Senior Director of Strategic Finance (application below) the normal way, it's just a little less fun & you don't get an auto-invite to final round.
English
41
15
311
126.5K
Amart (LOOP)⚡️
Amart (LOOP)⚡️@LoopOnChain·
@trillhause_ respectfully disagree my favorite tools are ai that scrape a companies date and builds a system custom based on their current system
English
0
0
0
38
Millin Gabani
Millin Gabani@trillhause_·
This is going to be a tarpit idea. It’s good in theory, but impossible to pull off unless it’s an internal company effort by a tyrant like CEO. An external company will never be able to build a software that results in a company brain. It’s mostly because no tool will have perfect adoption from all employees and data will always be fragmented across new systems. Chaotic systems are very hard to capture. It’s impossible to perfectly extract data from all sources as companies evolve and introduces new data sources. You will spend all the time keeping track of the data instead of doing actual work. This is same trap that the second brain productivity folks fall for.
Y Combinator@ycombinator

Company Brain @t_blom Every company has critical know-how scattered across people's heads, old Slack threads, support tickets, and databases, and AI agents can't operate like that. We think every company in the world is going to need a new primitive: a living map of how the company works that turns its own artifacts into an executable skills file for AI.

English
107
17
545
150.1K
Anastasios Nikolas Angelopoulos
Anastasios Nikolas Angelopoulos@ml_angelopoulos·
Why GPT-5.5 is lower than Claude? The answer is simple: Code Arena currently only supports frontend/web development tasks, where GPT-5.5 is weakest. Full-stack app development and GitHub integration will land in a couple months. Next time we'll be clearer that this leaderboard shows React/FE only, until we ship full-stack apps etc. Thanks for the feedback!
Anastasios Nikolas Angelopoulos tweet media
Arena.ai@arena

GPT-5.5 by @OpenAI is now live in the Arena, landing across multiple leaderboards. Here’s how it ranks by modality: - Code Arena (agentic web dev): #9, a strong +50pt jump over GPT-5.4 - Document Arena (analysis & long-content reasoning): #6, on par with Sonnet 4.6 - Text Arena: #7, Math #3, Instruction Following: #8 - Expert Arena: #5 - Search Arena: #2 - Vision Arena: #5 Strong, well-rounded performance, especially in Code (+50 pts vs GPT-5.4). Congrats to @OpenAI on the release. Full category breakdowns by modality in the thread.

English
47
39
688
114.9K
Amart (LOOP)⚡️
Amart (LOOP)⚡️@LoopOnChain·
I do AI implementation at startups + large companies one is an M&A firm, the 50+ year olds absolutely HATE ai like really hate it I show them they can turn the annoying manual spreadsheet process they've been doing for 10 years into to 1 skill /skill + enter then it's done they change their minds pretty quickly
English
2
0
1
145
Tibo
Tibo@thsottiaux·
Don't just reset Codex rate limits for fun, it costs money. Don't just reset Codex rate limits for fun, it costs money. ... but the vibes are good ... I have reset Codex rate limits for ALL paid plans to celebrate a good week and allow everyone to build more with GPT-5.5. Enjoy
English
1.5K
767
17.2K
1.3M
Amart (LOOP)⚡️
Amart (LOOP)⚡️@LoopOnChain·
@fxnction @openclaw use github, make sure they work in separate project folders, then use a CEO agent to manage them if working in the same project, have them take the “lock” so it shows the other agents which code they’re working on
English
2
0
1
416
fxnction
fxnction@fxnction·
Anyone else running more than 3 @openclaw agents? How do you keep them from stepping on each other?
English
31
0
25
6.8K
Amart (LOOP)⚡️
Amart (LOOP)⚡️@LoopOnChain·
@Voxyz_ai yeah significantly better, can also run claude via ACP sessions which works great
English
0
0
0
262
Vox
Vox@Voxyz_ai·
feels like openclaw is really back🦞. spent a lot of time yesterday just chatting with mine. would recommend gpt 5.5 + fast mode. speed is great too. you can try deleting every "allowed / permitted / when appropriate" from your soul. permission language reads as optional, and the model doesn't always reach for it. "you have dry wit" holds up. "humor is allowed" lands weaker. one is identity, the other is a switch. identity stays on by itself. switches need the model to actively flip them, and the model doesn't really do that.
English
19
6
188
11.9K
Ronan Berder
Ronan Berder@hunvreus·
Talking to smarter folks than me, I'm convinced many of the AI folks in my timeline are full of shit. Nobody is "running 20 agents over night" and building stuff for actual users. Maybe some are building internal tools or disposable software. Maybe. But building software people like using? That doesn't get hacked on day one or blow up after the 3rd user? Nope. I don't even understand what that's supposed to look like. Do you work out a 57 pages document that perfectly describes what you want to build and then summon 14 agents and have them run wild for 6 hours? And what comes out on the other end isn't a broken pile of shit? Nope. Not buying it. PS: it may also be that I have an IQ of 82 and can't figure it out.
English
670
271
4.9K
841.2K
Amart (LOOP)⚡️
Amart (LOOP)⚡️@LoopOnChain·
@KingBootoshi very helpful when it becomes a system or skill you can just call and have it go for 4 hours and build something flawless anything you’ve found particularly helpful or hurtful in your prompting/ skill making
English
1
0
1
138
BOOTOSHI 👑
BOOTOSHI 👑@KingBootoshi·
when I set up an orchestration flow using Claude Code to execute a very direct PRD with Codex agents, my runs take 3-4 hours to fully complete so I often have this running overnight because I've automated my entire prompt workflow I usually do first off you have to understand that creating and using agents are a form of alchemy. one wrong word/prompt/step and your output went from a God tier one shot to being absolute dog shit anyways my workflow i have agents do is research, scoping out the codebase, doing the actual implementation, then review. research/scoping code prevents hallucinations and induce a genuine understanding of the design and patterns set review agents will flag things that the PRD requested which got missed, or flawed bugs then the {manager agent} will handle spawning in these specialized sub agents per task. the main context window as a result of using subagents remain small, which lets it do a lot of managing to keep the entire workflow focused i'll often have a couple of these workflows running at the same time, because they LITERALLY take hours. so I just relax after setting off an army as a result, workflows like this mean I have one agent handling up to 16+ subagents throughout the whole sub process i also combine this with hard guards to enforce good design like eslint & custom rules, strict design patterns and a lot of foundational templates i've built for myself the last year a lot of what i do end up becoming skills so i'll literally /skill combo my way through a prompt to chain up a series of very detailed actions at the end of the full process, THEN i review the full PR and it's really easy for me to skim through the code this is absolutely not vibe coding, and instead taking engineering workflows i've fundamentally learned and automating the process with detailed agents there are levels to this shit. the smallest discovery could 2x the quality of your workflow. sometimes 10x, even more. you need to be open and keep exploring!
David Cramer@zeeg

Everyone is slowly coming to this realization, and I assure you, no one is running multitudes of agents overnight. No one that is doing anything of substance at least. There _are_ people pretending to be scientists, or fully caught up in their drug infused AI overdose, that think their slop machines are changing the world. They're not tho, and they're just wasting a bunch of money and compute to create a lot of LoC that will just get thrown away. The state of the art is still "can we even one shot a production quality patch that we wont regret later", and its rarer than you'd expect based on discourse.

English
9
4
53
5.5K
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
I finally got around to making a skill a lot of people have been asking me for: jeffreys-skills.md/skills/simplif… It basically helps to "de-slopify" and refactor code that's been written by agents, looking for ways to simplify and reduce the amount of code without changing the behavior. The difference between this and other skills or prompts in the same spirit is the lengths this one goes to in order to prevent the process from going off the rails and introducing bugs or security problems. It's a whole elaborate system spanning 98 files and one full megabyte of reference files, scripts, and subagents (see pic). You can run it over and over again and it will autonomously identify good opportunities for accretive simplification and do everything needed to implement the changes and prove that they didn't change the outputs. GPT-5.5 can explain better than I can how it does all that and what makes it so compelling and useful: --- The strongest thing about this skill is that it treats refactoring as a proof obligation. A normal “clean this up” prompt invites the model to follow taste. It sees repetition, long files, wrapper functions, stale types, try/catch clutter, _v2 files, and it starts cutting. Sometimes that works. Sometimes it silently changes error semantics, loses a side effect, removes a lifecycle hook, or deletes a file that looked unused but was actually the intended implementation path. This skill changes the frame. A simplification claim becomes: “this smaller program is observably equivalent to the larger one.” Then it makes the agent prove that claim. It starts with a baseline: tests, golden outputs, LOC, warnings, complexity. It maps duplication instead of eyeballing it. It classifies clones, because exact copy-paste, parametric duplication, semantic similarity, and accidental rhymes are completely different things. It scores each candidate by expected LOC saved, confidence, and risk. Low-score candidates get rejected and logged, which is important because future agents otherwise rediscover the same bad idea forever. The isomorphism card is the key move. Before editing, the agent has to answer boring but lethal questions: same ordering, same errors, same logs, same metrics, same side effects, same async cancellation behavior, same React hook identity, same serialization, same resource lifecycle. Those rows catch the kind of bugs that compilers and ordinary tests miss. Then the edit discipline is deliberately narrow: one lever per commit, no rewrites, no sed, no drive-by fixes, no deletion without explicit permission. Afterward, it verifies behavior again and records the result in a ledger. If the refactor did not actually preserve behavior, it does not get to call itself a refactor. What I like about it is that it matches the real failure modes of agent-written code. AI code tends to accumulate plausible junk: defensive branches for impossible inputs, duplicated wrappers, too many optional parameters, orphaned “improved” files, shallow happy-path tests, stale types, and comments that are really leftover task plans. The skill has a whole pathology catalog for those patterns, plus scripts and subagent roles to find them systematically. So the compelling part is not “make the code prettier.” The compelling part is leverage with brakes. You can send very strong models into messy codebases and ask them to reduce complexity aggressively, while forcing them to preserve the contract that matters: observable behavior. That is the difference between a refactor you hope is safe and a refactor you can audit.
Jeffrey Emanuel tweet mediaJeffrey Emanuel tweet mediaJeffrey Emanuel tweet mediaJeffrey Emanuel tweet media
English
29
41
845
54.4K
Amart (LOOP)⚡️
Amart (LOOP)⚡️@LoopOnChain·
@pashmerepat Do we need to tell our agents to start using codex as the default harness or is this automatic when using the open ai codex oauth?
English
1
0
0
170
pash
pash@pashmerepat·
I've embarked on a new sprint. My mission is to make OpenAI models feel magical in OpenClaw in the next few weeks. Diving in today, I noticed a bug. When you configured OpenClaw to use the Codex harness with OpenAI models, auth was broken, and the system was silently falling back to the Pi harness. So nobody knew it was broken. Two PRs later (fix the auth bridge, stop the silent fallback), the Codex harness actually works. And the difference is night and day (pic related). Before: the agent didn't feel magical or proactive. It did the exact same shallow loop every heartbeat. Read the heartbeat file, check Discord, see nothing, say HEARTBEAT_OK. It ignored the rest of its instructions. Sometimes it would even reason about doing work and then just... not issue the tool calls. After: full agent loops. It reads its workspace context, interprets the entire checklist, inspects the repo, makes real edits, tries to verify them, and gives honest status reports when things are blocked. Later heartbeats show continuity, it doesn't repeat work, it picks up where it left off. I didn't change any prompting or scaffolding. Just swapped in the codex harness for pi. Lesson here is use the codex harness if you're building with OAI models. A lot more to do but this is a strong start.
pash tweet media
English
127
89
1.5K
624.3K
Amart (LOOP)⚡️
Amart (LOOP)⚡️@LoopOnChain·
Internal productivity of a company is now the new moat Ship faster, yell louder You win
English
3
0
4
145
Max Blade
Max Blade@_MaxBlade·
I am so confident this week Peter + Open Ai team will bring openclaw back to life. Opus is the only model that works right now, but pay per use ruins the entire experience. Instead of building and exploring your constantly left thinking about how much this is going to cost you. It’s too important to let die.
Peter Steinberger 🦞@steipete

@LoopOnChain Yeah there's a bunch more work that's in the pipeline, this will change this week.

English
18
3
127
18K