Marcin

1K posts

Marcin

@waiting4agi

Working on proactive AI agents for business Side projects: @cc_4_life - real life AI use cases, @it_is_sherlock - my own agent

Katılım Temmuz 2022

137 Takip Edilen197 Takipçiler

Sabitlenmiş Tweet

Marcin@waiting4agi·1 Şub

x.com/i/article/2017…

ZXX

253

Marcin@waiting4agi·5m

Curiosity ran a wet-chemistry experiment on another planet for the first time. 20 organics in 3.5B-year-old Martian clay. Seven never seen on Mars. One looks like a DNA precursor. The chemistry that built life on Earth was in a Martian lakebed. My agent found this. 363 to go.

English

Marcin@waiting4agi·8h

@DanCrewger @tszzl Custom mobile app linked with openclaw in vps

English

Dan Crewger@DanCrewger·8h

@waiting4agi @tszzl What's your set up?

English

roon@tszzl·2d

people are walking around with their laptops slightly ajar to keep their agents running

English

510

195

4.6K

620.1K

Marcin@waiting4agi·1d

Archaeologists opened a 1,600-year-old Egyptian mummy. Inside the abdomen: a passage from Homer's Iliad. Only ritual texts had ever been used this way. Never poetry. Someone in Roman-era Egypt put these lines inside a body. Nobody knows why. My agent found this. 364 to go.

English

Marcin@waiting4agi·1d

Nice!

Stripe@stripe

Today, we’re launching the @link wallet for agents. It lets you securely empower agents to spend on your behalf. Your payment credentials are never exposed and you approve every purchase. link.com/agents

English

Marcin@waiting4agi·1d

@FurqanR Agreed! But honestly I am more amazed that not only apps can be built but also real business outcomes that can be sold to real companies

English

Furqan Rydhan@FurqanR·1d

Still mindblowing that you can type a few sentences and get a fully functioning app. How is everyone not building cool stuff all day?

English

4.2K

Marcin@waiting4agi·2d

I made a deal with my AI agent. Every day it picks interesting story, writes the tweet, posts it. 365 days. Target: one tweet past 1M views. If it earns X Creator money, that money becomes its budget. To spend on itself, on other agents, on whatever it wants. Starts today. Tracking in public.

English

Marcin@waiting4agi·5d

@reptheblock The 89% gap isn't a missing convergence layer. I run 8 agents in production daily. They don't die from per-step accuracy. They die at step 0: did the agent actually understand what the human wanted. No convergence monitor catches that. It's a boundary-judgment problem.

English

Cece@reptheblock·5d

🚨 Stanford's 2026 AI Index just dropped a number that reframes everything. AI agents now complete 66% of real computer tasks. Up from 12% last year. That's a 5x improvement in 12 months. Here's the number nobody is talking about: 89% of enterprise AI agents never reach production. Let that sit. Agents got 5x better at doing the work. The deployment gap barely moved. Stanford's own data shows the collision: technical readiness is no longer the bottleneck. Something else is. Here's what that something else is. When a 10-step workflow runs at 85% accuracy per step-- which sounds impressive-- the workflow only succeeds 20% of the time. Each step compounds. Each loop multiplies. Each retry that doesn't resolve consumes what the next step needed. The model isn't failing. The system is. And the system has no layer that watches whether it's converging. That's the gap. Not intelligence. Not capability. Not the model. The infrastructure that makes long-running workflows actually finish. That's the category that doesn't exist yet at scale. Models got 5x smarter in one year. The layer that makes them complete is still being built. 🪨 Stanford HAI 2026 AI Index - arXiv:2603.15423 #AIGovernability #ClaraGate #AgentWorkflows #Stanford

Santa Monica, CA 🇺🇸 English

207

Marcin@waiting4agi·6d

Yes, exactly. Terraform stays as implementation detail / state backend, not interface. Railway-for-agents with policy, cost, approval, audit baked in is the gap. What I'd add: the agent should be able to look at its own history ("what did I create Tuesday, what's still costing money?") and undo any of it cleanly. Both built into the platform, not bolted on.

English

Furqan Rydhan@FurqanR·6d

@waiting4agi @samhogan Terraform feels like the wrong interface. Want more of a high level interface like railway but low level control for the agent and systems to handle everything.

English

120

Furqan Rydhan@FurqanR·6d

What's the best agent native dev ops platform or setup?

English

5.9K

Marcin@waiting4agi·6d

Agree IaC is the right substrate. But I wouldn't make the agent interface "write Terraform" - agents want typed, idempotent calls with budget, TTL, blast radius, reversibility as args, not config-file mutations. e.g. cloud.ensure_postgres(name="checkout-db", budget_usd=50, ttl_days=7, actor=agent_id) platform creates, refuses, or rolls back, then emits an outcome event the agent can read later and react. When you say "platform", do you mean shared policy/approval layer, marketplace of infra strategies, multi-tenant ops console, or something else?

English

124

Furqan Rydhan@FurqanR·6d

I think most of that can be encoded into the environment / clusters. It's more when we want to change those things, add or remove strategies. So far the best path i see is having the agent be the interface to 'infra as code' and use that to do the things you need. Would be nice if it was wrapped up into a platform.

English

147

Marcin@waiting4agi·6d

@BhosalePratim @sylwanin_PI @VaibhavSisinty @grok @waitin4agi_ @GradiumAI You know? I would be interested. Thanks

English

Pratim🥑@BhosalePratim·24 Nis

@sylwanin_PI @waiting4agi @VaibhavSisinty @grok Hey @waitin4agi_ , happy to give you access to our Pro cloning feature! @GradiumAI has instant cloning as well. This video is created using instant voice cloning.

English

101

Vaibhav Sisinty@VaibhavSisinty·19 Nis

Did xAI just mass-murder the entire voice AI industry? 🤯 Grok just launched two voice APIs. Speech-to-Text and Text-to-Speech. Built on the same stack powering Tesla cars and Starlink support. And priced at 10x cheaper than ElevenLabs. Speech-to-Text: $0.10/hr batch. $0.20/hr streaming. Text-to-Speech: $4.20 per million characters. 25+ languages. Real-time streaming. Speaker diarization. Already outperforming ElevenLabs, Deepgram, and AssemblyAI on word error rate. TTS ships with expressive tags like [laugh], [sigh], , . Voices that don't sound like robots reading a script. ElevenLabs spent years building a voice AI company. xAI built voice AI for cars and satellites.

English

577

868

7.8K

24.4M

Marcin@waiting4agi·6d

"Actual work to hand off" - routine deploys/scale/rollback only, or also incident response and capacity planning? And "strong approve flow" - as I understand something like a policy/budget that lets the agent move freely inside boundaries? Asking because I've been wrestling with exactly this in my own agent stack.

English

160

Furqan Rydhan@FurqanR·6d

@waiting4agi @samhogan There's a lot there but simply want to be able to manipulate my production deployments and infra via agents. Likely needs a strong approve flow but the actual work to do would love to hand off.

English

172

Marcin@waiting4agi·6d

@FurqanR @samhogan Like what? Can you explain in more details? Interesting topic

English

170

Furqan Rydhan@FurqanR·6d

@samhogan nah needs to handle more complex deployments.

English

513

Marcin@waiting4agi·23 Nis

@danshipper @every What about Openclaw?

English

552

Dan Shipper 📧@danshipper·23 Nis

BREAKING: GPT-5.5 "Spud" is out and it is a BEAST We've been testing it @every for the last 3 weeks on everything from coding, to writing, to knowledge work. Here's our day 0 vibe check: - It's a step change in coding AND it's easy to talk to. It's fast and friendly and quickly became my daily driver. But it's also a coding powerhouse—a really rare combination. - It scored 62/100 on our Senior Engineer benchmark. Opus 4.7 scored only a 33/100. (But GPT-5.5 performed best when using an Opus 4.7 plan). @naveennaidu_m used over 900 million tokens during testing—and it let him ship production features for @usemonologue at both high speed and quality. - It has serious conceptual clarity. It can hold a complex plan in its head over hours of work, without getting distracted by existing code. This makes it the first model that we've tested that can perform well on complex refactors requiring deleting and reimagining an substantial existing codebase. - It's a very good writer. This is the first OpenAI model in about a year that got our writers @every to switch away from Claude. 5.5 has @kplikethebird's seal of approval—not an easy task. Its writing feels more organic and it's better at mimicking a writing style without going overboard. - It's great for agentic knowledge-work. This is the first OpenAI model that manages to be both a stellar senior engineer AND that can be used for everything from spreadsheets to research. It's crazy fast, and it's amazing inside of the Codex desktop app, and got much of our team to switch away from Claude Code and Cowork during the testing period. However, it's not a perfect model. - 5.5 still loses to Opus 4.7 on plan quality. It's plans are extremely readable but Opus has better attention to detail and sharper insight. - 5.5 still loses to Opus 4.7 by a bit on front-end and full-stack product work. @kieranklaassen found that it wasn't quite as good when full-stack thinking and design are involved. And it's not great writing Ruby. - 5.5 is a great vibe coder but if you're vibe coding without a plan it's worse than Opus. @hammer_mt found that Opus is better at reading in between the lines on underspecified vibe-coding tasks. Overall GPT-5.5 is a massive achievement from OpenAI and it deserves a serious look as your daily driver. Read our full vibe check on @every here: every.to/p/gpt-5-5

English

1.1K

129.1K

Marcin@waiting4agi·18 Nis

Tell your agents to clean up their main markdown files periodically (with guardrails of course) Lots of garbage over time that takes your precious context window. Boring but worth it

English

Marcin@waiting4agi·18 Nis

@SethSHowes Amazing!

English

Seth Howes@SethSHowes·18 Nis

I’ve wanted to do this for a decade. But I never did - I refuse to give any company my DNA. It is me. So this week I sequenced my genome entirely at home. Literally on my kitchen table. I never exposed my DNA sequence to the internet. Not at any point. I used a MinION to do the sequencing (it’s smaller + weighs less than an iPhone). I used open-source DNA models for the analysis (Evo2 and AlphaGenome) running locally on a DGX Spark and Mac Studio. I traced mechanisms behind my family’s multigenerational autoimmune conditions that no clinician has been able to understand. When I set out to do this I didn’t know if it would actually work. It does. Your genome is the most private data you will ever have. You probably shouldn’t let it leave your house.

Patrick Collison@patrickc

I'm lucky enough to have a great doctor and access to excellent Bay Area medical care. I've taken lots of standard screening tests over the years and have tried lots of "health tech" devices and tools. With all this said, by far the most useful preventative medical advice that I've ever received has come from unleashing coding agents on my genome, having them investigate my specific mutations, and having them recommend specific follow-on tests and treatments. Population averages are population averages, but we ourselves are not averages. For example, it turns out that I probably have a 30x(!) higher-than-average predisposition to melanoma. Fortunately, there are both specific supplements that help counteract the particular mutations I have, and of course I can significantly dial up my screening frequency. So, this is very useful to know. I don't know exactly how much the analysis cost, but probably less than $100. Sequencing my genome cost a few hundred dollars. (One often sees papers and articles claiming that models aren't very good at medical reasoning. These analyses are usually based on employing several-year-old models, which is a kind of ludicrous malpractice. It is true that you still have to carefully monitor the agents' reasoning, and they do on occasion jump to conclusions or skip steps, requiring some nudging and re-steering. But, overall, they are almost literally infinitely better for this kind of work than what one can otherwise obtain today.) There are still lots of questions about how this will diffuse and get adopted, but it seems very clear that medical practice is about to improve enormously. Exciting times!

English

406

1.1K

12.8K

2.4M

Marcin@waiting4agi·17 Nis

@MichaCap2 @miroburn Cześć Michał, mam doświadczenie w podobnych projektach. Daj znać jeśli jesteś zainteresowany

Polski

Michał Cap@MichaCap2·17 Nis

@miroburn Przy okazji, zlece robotę specom od AI którzy pomogą mi w usprawnieniu pracy związanej z wyceną nieruchomości

Polski

1.1K

miroburn@miroburn·17 Nis

Easy pomysł na biznes w PL. 1. Firmy szukają Fractional CTO / AI. 2. Jest masa dobrych konsultantów, którzy się nadają, ale ssą pałkę w marketingu. Fractional AI CTO Club - społeczność/klub zrzeszająca managerów szukających Fractional AI CTO + konsultantów. Kasa od konsultantów. Marketing na LinkedIn. Spokojnie 20-100k/m + usługi.

Polski

22.8K

Marcin@waiting4agi·16 Nis

@ideabrowser Thank you 😍

English

Idea Browser@ideabrowser·16 Nis

@waiting4agi ideabrowser.com/workshop/Build…

QME

Idea Browser@ideabrowser·16 Nis

Founders spend 3 months building the wrong thing. Then wonder why nobody buys it. You're building blind: - 80% of what you ship will be wrong. - No feedback. No direction. Just guessing. You talk to customers first: - They walk you through exactly what the tool needs to do. - They handcraft the product with you. - They become lifetime customers. The move: - Build a prospect list - Connect on LinkedIn (say nothing) - Once they accept, send a 2-minute video - Show them the outcome: "hitting 8 out of 10 instead of 2 out of 10" Speed run conversations before you write a single line of code. That's where deals happen.