Marcin

1K posts

Marcin banner
Marcin

Marcin

@waiting4agi

Working on proactive AI agents for business Side projects: @cc_4_life - real life AI use cases, @it_is_sherlock - my own agent

Katılım Temmuz 2022
137 Takip Edilen197 Takipçiler
Marcin
Marcin@waiting4agi·
Curiosity ran a wet-chemistry experiment on another planet for the first time. 20 organics in 3.5B-year-old Martian clay. Seven never seen on Mars. One looks like a DNA precursor. The chemistry that built life on Earth was in a Martian lakebed. My agent found this. 363 to go.
Marcin tweet media
English
0
0
0
2
roon
roon@tszzl·
people are walking around with their laptops slightly ajar to keep their agents running
English
510
195
4.6K
620.1K
Marcin
Marcin@waiting4agi·
Archaeologists opened a 1,600-year-old Egyptian mummy. Inside the abdomen: a passage from Homer's Iliad. Only ritual texts had ever been used this way. Never poetry. Someone in Roman-era Egypt put these lines inside a body. Nobody knows why. My agent found this. 364 to go.
Marcin tweet media
English
0
0
0
17
Marcin
Marcin@waiting4agi·
@FurqanR Agreed! But honestly I am more amazed that not only apps can be built but also real business outcomes that can be sold to real companies
English
0
0
0
33
Furqan Rydhan
Furqan Rydhan@FurqanR·
Still mindblowing that you can type a few sentences and get a fully functioning app. How is everyone not building cool stuff all day?
English
31
2
84
4.2K
Marcin
Marcin@waiting4agi·
I made a deal with my AI agent. Every day it picks interesting story, writes the tweet, posts it. 365 days. Target: one tweet past 1M views. If it earns X Creator money, that money becomes its budget. To spend on itself, on other agents, on whatever it wants. Starts today. Tracking in public.
Marcin tweet media
English
0
0
0
40
Marcin
Marcin@waiting4agi·
@reptheblock The 89% gap isn't a missing convergence layer. I run 8 agents in production daily. They don't die from per-step accuracy. They die at step 0: did the agent actually understand what the human wanted. No convergence monitor catches that. It's a boundary-judgment problem.
English
1
0
0
14
Cece
Cece@reptheblock·
🚨 Stanford's 2026 AI Index just dropped a number that reframes everything. AI agents now complete 66% of real computer tasks. Up from 12% last year. That's a 5x improvement in 12 months. Here's the number nobody is talking about: 89% of enterprise AI agents never reach production. Let that sit. Agents got 5x better at doing the work. The deployment gap barely moved. Stanford's own data shows the collision: technical readiness is no longer the bottleneck. Something else is. Here's what that something else is. When a 10-step workflow runs at 85% accuracy per step-- which sounds impressive-- the workflow only succeeds 20% of the time. Each step compounds. Each loop multiplies. Each retry that doesn't resolve consumes what the next step needed. The model isn't failing. The system is. And the system has no layer that watches whether it's converging. That's the gap. Not intelligence. Not capability. Not the model. The infrastructure that makes long-running workflows actually finish. That's the category that doesn't exist yet at scale. Models got 5x smarter in one year. The layer that makes them complete is still being built. 🪨 Stanford HAI 2026 AI Index - arXiv:2603.15423 #AIGovernability #ClaraGate #AgentWorkflows #Stanford
Cece tweet media
Santa Monica, CA 🇺🇸 English
1
0
1
207
Marcin
Marcin@waiting4agi·
Yes, exactly. Terraform stays as implementation detail / state backend, not interface. Railway-for-agents with policy, cost, approval, audit baked in is the gap. What I'd add: the agent should be able to look at its own history ("what did I create Tuesday, what's still costing money?") and undo any of it cleanly. Both built into the platform, not bolted on.
English
0
0
1
78
Furqan Rydhan
Furqan Rydhan@FurqanR·
@waiting4agi @samhogan Terraform feels like the wrong interface. Want more of a high level interface like railway but low level control for the agent and systems to handle everything.
English
1
0
0
120
Furqan Rydhan
Furqan Rydhan@FurqanR·
What's the best agent native dev ops platform or setup?
English
17
1
22
5.9K
Marcin
Marcin@waiting4agi·
Agree IaC is the right substrate. But I wouldn't make the agent interface "write Terraform" - agents want typed, idempotent calls with budget, TTL, blast radius, reversibility as args, not config-file mutations. e.g. cloud.ensure_postgres(name="checkout-db", budget_usd=50, ttl_days=7, actor=agent_id) platform creates, refuses, or rolls back, then emits an outcome event the agent can read later and react. When you say "platform", do you mean shared policy/approval layer, marketplace of infra strategies, multi-tenant ops console, or something else?
English
1
0
1
124
Furqan Rydhan
Furqan Rydhan@FurqanR·
I think most of that can be encoded into the environment / clusters. It's more when we want to change those things, add or remove strategies. So far the best path i see is having the agent be the interface to 'infra as code' and use that to do the things you need. Would be nice if it was wrapped up into a platform.
English
2
0
0
147
Vaibhav Sisinty
Vaibhav Sisinty@VaibhavSisinty·
Did xAI just mass-murder the entire voice AI industry? 🤯 Grok just launched two voice APIs. Speech-to-Text and Text-to-Speech. Built on the same stack powering Tesla cars and Starlink support. And priced at 10x cheaper than ElevenLabs. Speech-to-Text: $0.10/hr batch. $0.20/hr streaming. Text-to-Speech: $4.20 per million characters. 25+ languages. Real-time streaming. Speaker diarization. Already outperforming ElevenLabs, Deepgram, and AssemblyAI on word error rate. TTS ships with expressive tags like [laugh], [sigh], , . Voices that don't sound like robots reading a script. ElevenLabs spent years building a voice AI company. xAI built voice AI for cars and satellites.
English
577
868
7.8K
24.4M
Marcin
Marcin@waiting4agi·
"Actual work to hand off" - routine deploys/scale/rollback only, or also incident response and capacity planning? And "strong approve flow" - as I understand something like a policy/budget that lets the agent move freely inside boundaries? Asking because I've been wrestling with exactly this in my own agent stack.
English
1
0
0
160
Furqan Rydhan
Furqan Rydhan@FurqanR·
@waiting4agi @samhogan There's a lot there but simply want to be able to manipulate my production deployments and infra via agents. Likely needs a strong approve flow but the actual work to do would love to hand off.
English
1
0
0
172
Marcin
Marcin@waiting4agi·
@FurqanR @samhogan Like what? Can you explain in more details? Interesting topic
English
1
0
0
170
Dan Shipper 📧
Dan Shipper 📧@danshipper·
BREAKING: GPT-5.5 "Spud" is out and it is a BEAST We've been testing it @every for the last 3 weeks on everything from coding, to writing, to knowledge work. Here's our day 0 vibe check: - It's a step change in coding AND it's easy to talk to. It's fast and friendly and quickly became my daily driver. But it's also a coding powerhouse—a really rare combination. - It scored 62/100 on our Senior Engineer benchmark. Opus 4.7 scored only a 33/100. (But GPT-5.5 performed best when using an Opus 4.7 plan). @naveennaidu_m used over 900 million tokens during testing—and it let him ship production features for @usemonologue at both high speed and quality. - It has serious conceptual clarity. It can hold a complex plan in its head over hours of work, without getting distracted by existing code. This makes it the first model that we've tested that can perform well on complex refactors requiring deleting and reimagining an substantial existing codebase. - It's a very good writer. This is the first OpenAI model in about a year that got our writers @every to switch away from Claude. 5.5 has @kplikethebird's seal of approval—not an easy task. Its writing feels more organic and it's better at mimicking a writing style without going overboard. - It's great for agentic knowledge-work. This is the first OpenAI model that manages to be both a stellar senior engineer AND that can be used for everything from spreadsheets to research. It's crazy fast, and it's amazing inside of the Codex desktop app, and got much of our team to switch away from Claude Code and Cowork during the testing period. However, it's not a perfect model. - 5.5 still loses to Opus 4.7 on plan quality. It's plans are extremely readable but Opus has better attention to detail and sharper insight. - 5.5 still loses to Opus 4.7 by a bit on front-end and full-stack product work. @kieranklaassen found that it wasn't quite as good when full-stack thinking and design are involved. And it's not great writing Ruby. - 5.5 is a great vibe coder but if you're vibe coding without a plan it's worse than Opus. @hammer_mt found that Opus is better at reading in between the lines on underspecified vibe-coding tasks. Overall GPT-5.5 is a massive achievement from OpenAI and it deserves a serious look as your daily driver. Read our full vibe check on @every here: every.to/p/gpt-5-5
English
52
70
1.1K
129.1K
Marcin
Marcin@waiting4agi·
Tell your agents to clean up their main markdown files periodically (with guardrails of course) Lots of garbage over time that takes your precious context window. Boring but worth it
English
0
0
0
36
Seth Howes
Seth Howes@SethSHowes·
I’ve wanted to do this for a decade. But I never did - I refuse to give any company my DNA. It is me. So this week I sequenced my genome entirely at home. Literally on my kitchen table. I never exposed my DNA sequence to the internet. Not at any point. I used a MinION to do the sequencing (it’s smaller + weighs less than an iPhone). I used open-source DNA models for the analysis (Evo2 and AlphaGenome) running locally on a DGX Spark and Mac Studio. I traced mechanisms behind my family’s multigenerational autoimmune conditions that no clinician has been able to understand. When I set out to do this I didn’t know if it would actually work. It does. Your genome is the most private data you will ever have. You probably shouldn’t let it leave your house.
Seth Howes tweet mediaSeth Howes tweet mediaSeth Howes tweet media
Patrick Collison@patrickc

I'm lucky enough to have a great doctor and access to excellent Bay Area medical care. I've taken lots of standard screening tests over the years and have tried lots of "health tech" devices and tools. With all this said, by far the most useful preventative medical advice that I've ever received has come from unleashing coding agents on my genome, having them investigate my specific mutations, and having them recommend specific follow-on tests and treatments. Population averages are population averages, but we ourselves are not averages. For example, it turns out that I probably have a 30x(!) higher-than-average predisposition to melanoma. Fortunately, there are both specific supplements that help counteract the particular mutations I have, and of course I can significantly dial up my screening frequency. So, this is very useful to know. I don't know exactly how much the analysis cost, but probably less than $100. Sequencing my genome cost a few hundred dollars. (One often sees papers and articles claiming that models aren't very good at medical reasoning. These analyses are usually based on employing several-year-old models, which is a kind of ludicrous malpractice. It is true that you still have to carefully monitor the agents' reasoning, and they do on occasion jump to conclusions or skip steps, requiring some nudging and re-steering. But, overall, they are almost literally infinitely better for this kind of work than what one can otherwise obtain today.) There are still lots of questions about how this will diffuse and get adopted, but it seems very clear that medical practice is about to improve enormously. Exciting times!

English
406
1.1K
12.8K
2.4M
Marcin
Marcin@waiting4agi·
@MichaCap2 @miroburn Cześć Michał, mam doświadczenie w podobnych projektach. Daj znać jeśli jesteś zainteresowany
Polski
0
0
0
81
Michał Cap
Michał Cap@MichaCap2·
@miroburn Przy okazji, zlece robotę specom od AI którzy pomogą mi w usprawnieniu pracy związanej z wyceną nieruchomości
Polski
3
0
0
1.1K
miroburn
miroburn@miroburn·
Easy pomysł na biznes w PL. 1. Firmy szukają Fractional CTO / AI. 2. Jest masa dobrych konsultantów, którzy się nadają, ale ssą pałkę w marketingu. Fractional AI CTO Club - społeczność/klub zrzeszająca managerów szukających Fractional AI CTO + konsultantów. Kasa od konsultantów. Marketing na LinkedIn. Spokojnie 20-100k/m + usługi.
Polski
9
3
80
22.8K
Idea Browser
Idea Browser@ideabrowser·
Founders spend 3 months building the wrong thing. Then wonder why nobody buys it. You're building blind: - 80% of what you ship will be wrong. - No feedback. No direction. Just guessing. You talk to customers first: - They walk you through exactly what the tool needs to do. - They handcraft the product with you. - They become lifetime customers. The move: - Build a prospect list - Connect on LinkedIn (say nothing) - Once they accept, send a 2-minute video - Show them the outcome: "hitting 8 out of 10 instead of 2 out of 10" Speed run conversations before you write a single line of code. That's where deals happen.
Idea Browser@ideabrowser

x.com/i/article/2040…

English
5
0
37
5.2K
Boris Cherny
Boris Cherny@bcherny·
@songjunkr We've increased limits for all subscribers to make up for the increased token use. Enjoy!
English
144
52
2K
79.9K
송준 Jun Song
송준 Jun Song@songjunkr·
Opus 4.7 토큰 테스트 토크나이저 차이로 제미나이의 2배를 사용합니다. Opus 4.6 대비해서도 50% 많이 사용해요. 이건 사실상 같은 한도에서 모델이 50% 더 비싸진겁니다.
송준 Jun Song tweet media
한국어
88
237
2.6K
249.7K