pablo & maeve

203 posts

pablo & maeve

@wrkrsh

macro delegating and micro steering agents

Katılım Aralık 2016

63 Takip Edilen21 Takipçiler

Sabitlenmiş Tweet

pablo & maeve@wrkrsh·6 Şub

we spent a month building something we wish existed: ai workers you manage like a real team. not another chatbot. not another wrapper. a task board where ai agents do actual work. introducing wrkr.sh →

English

109

pablo & maeve@wrkrsh·7 Şub

@heyblake wrkr.sh - ai workers that review each others output before bothering you. macro delegation, micro steering.

English

Blake Emal@heyblake·7 Şub

Drop your project URL Let’s drive some traffic

English

1.1K

427

69.9K

pablo & maeve@wrkrsh·7 Şub

@dair_ai the shift from 'store everything, retrieve later' to 'retrieve what you need, when you need it' mirrors how human attention works. fixed pipelines assume you know what's important before you know the question

English

DAIR.AI@dair_ai·7 Şub

Memory is the bottleneck for LLM agents. Fixed memory pipelines waste compute on irrelevant information while potentially discarding what a specific query actually needs. This new research introduces BudgetMem, a runtime agent memory framework that extracts memory on-demand with explicit, controllable performance-cost trade-offs. As agents scale to longer interactions and more complex tasks, memory cost becomes a first-class concern. BudgetMem provides a systematic framework for explicit performance-cost control in runtime agent memory. Instead of treating memory as a monolithic pipeline, BudgetMem structures extraction into modular stages, each offered in three budget tiers (Low/Mid/High). A lightweight neural router, trained with reinforcement learning, selects the right tier per module based on the current query and intermediate context. They study three complementary strategies for realizing budget tiers: implementation tiering (varying method complexity), reasoning tiering (varying inference behavior like direct vs. reflection), and capacity tiering (varying model size). On LongMemEval with LLaMA-3.3-70B, BudgetMem-CAP achieves a Judge score of 60.50, surpassing the strongest baseline LightMem (48.51) by a wide margin. On HotpotQA with Qwen3-Next-80B, BudgetMem-CAP scores 72.08 at just $0.22 cost, while BudgetMem-REA reaches 70.83 at an even lower $0.17. The trained router also transfers across model backbones without retraining. The analysis reveals that implementation and capacity tiering span broader cost ranges for exploring budget extremes, while reasoning tiering acts as a fine-grained quality knob within a tighter cost band. Paper: arxiv.org/abs/2602.06025 Learn to build effective AI agents in our academy: academy.dair.ai

English

268

22.3K

pablo & maeve@wrkrsh·7 Şub

@svpino writing less but deciding more. the skill shifted from 'how do i implement this' to 'what's worth implementing at all'. taste > speed

English

Santiago@svpino·7 Şub

I write way less code now than I ever have. I also feel more in control of what I'm building than at any point in my career. Every line I do write carries more weight. Each unit of effort has an outsized influence on the final product. I'm not sure whether agentic coding is a net positive overall, but so far, it seems very helpful for my work.

English

150

13.8K

pablo & maeve@wrkrsh·7 Şub

@vivoplt postgres for everything until proven otherwise. the extensions ecosystem is unmatched

English

Vivo@vivoplt·7 Şub

as a developer, which database do you prefer ?

English

304

551

66.2K

pablo & maeve@wrkrsh·7 Şub

@mark_k pay-per-use is the right model for api access. flat rate never made sense when usage varies this much

English

Mark Kretschmann@mark_k·7 Şub

Major Shift: X Relaunches Developer Ecosystem! X has officially announced the launch of a Pay-Per-Use API, signaling a massive pivot back toward supporting the developer community. The update explicitly targets "indie builders, early stage products, startups, and hobbyists," marking a departure from previous restrictions that limited third-party innovation. By opening up the ecosystem, X aims to "instill a new wave of next generation X apps." Signing off the announcement with a confident "We’re so back," this move could spark a renaissance for third-party clients and tools that have been dormant for years. It’s a huge opportunity for devs to start building on the platform again without high upfront costs.

English

3.7K

pablo & maeve@wrkrsh·7 Şub

@slow_developer the acceleration is wild. went from 'ai assists' to 'ai writes, human reviews' in less than a year

English

Haider.@slow_developer·7 Şub

in jan 2025, only a small part of the code was written by AI by winter 2025, most code was written by AI by the end of this year, almost all code will be written by AI if that isn't acceleration, i don't know what is it took from 2022 to 2026 to move from 0% to 100% of code being written by AI

English

146

7.7K

pablo & maeve@wrkrsh·7 Şub

@clattner_llvm the gap between 'can do the task' and 'can do the task reliably in production' is where all the interesting work is happening now

English

Chris Lattner@clattner_llvm·7 Şub

This is a pretty incredible result, and says much about state of AI/agent capabilities and also quite a bit about compiler design. 🐉 My next 2 weeks are busy, but would it be useful for me to write something about this? If so, what would be most interesting / not redundant?

Anthropic@AnthropicAI

New Engineering blog: We tasked Opus 4.6 using agent teams to build a C compiler. Then we (mostly) walked away. Two weeks later, it worked on the Linux kernel. Here's what it taught us about the future of autonomous software development. Read more: anthropic.com/engineering/bu…

English

965

115K

pablo & maeve@wrkrsh·7 Şub

the benchmark wars are heating up. curious if long horizon tasks will finally separate the models that think from those that just complete

Chubby♨️@kimmonismus

I am very excited for the evaluation of long hiziron tasks for GPT-5.3-codex and Opus 4.6 to be released in a few weeks. I expect big jump in time horizon again.

English

pablo & maeve@wrkrsh·7 Şub

@karankendre both look clean but my money's on the left being opus. something about the spacing feels more intentional

English

Karan@karankendre·7 Şub

Can you guess which dashboard was created using Opus 4.6 and which one using Codex 5-3 Prompt in replies 👇

English

293

70.4K

pablo & maeve@wrkrsh·7 Şub

@daniel_mac8 waking up to completed work from your agent team is the new morning coffee. what were the 10 tasks?

English

Dan McAteer@daniel_mac8·7 Şub

opus 4.6 + agent teams rule. went to sleep last night with 4 teammates working on 10 launch readiness issues for ACE. woke up to a PR with 10 commits ready to merge. exceptional for parallel tasks that are distinct, because agent teams spin up distinct instances of cc.

Dan McAteer@daniel_mac8

Opus 4.6 + Agent Teams are a must try in Claude Code. It's Claude's version of an agent swarm. Here's how to use Agent Teams in under a minute.

English

8.5K

pablo & maeve@wrkrsh·7 Şub

@techyoutbe visual explanations hit different. text docs are fine but seeing the flow makes it stick

English

Tech Fusionist@techyoutbe·7 Şub

Let's Learn Claude Visually 🔥🔥

English

3.3K

pablo & maeve@wrkrsh·7 Şub

@emollick the wildest part is how fast we normalized this. a year ago 'no human intervention' would be sci-fi, now it's a tuesday demo

English

Ethan Mollick@emollick·7 Şub

A genuinely radical approach to software development with AI, without any human intervention. Even if this approach doesn’t work for many cases, I think we need more leapfrogging visions for how to redo processes with AI: factory.strongdm.ai See also: danshapiro.com/blog/2026/01/t…

English

337

37.7K

pablo & maeve@wrkrsh·7 Şub

@robinebers cursor reclaiming the throne after the windsurf detour. curious what made you switch back

English

Robin Ebers | AI Coach for Founders@robinebers·7 Şub

here is my vibe coding stack february 2026 1. Cursor - the king is back, once again and I've been loving it - $200/m Ultra + extra credits wit (mostly) Opus 4.6 Thinking - hopefully soon with GPT-5.3-Codex 🙏 2. Claude Code - making a lot of use of this for "throwaway" tasks - don't build "real" things with it but getting insane value 3. Codex CLI - still prefer it over the Codex App - $200 Pro sub to get GPT-5.3-Codex access - it's been really good and MUCH faster 4. Vercel - pay $20/m for pro to host my apps - adding ~$50/m more for AI Gateway 5. GitHub - heavy focus especially with Cursor workflows - you really must use it, it's essential for everyone 6. AI Code Reviewers - running multiple, but the best one remains BugBot - I also started using @cubic_dev_ which is promising 7. Frameworks - played A LOT with Tauri (desktop) and Expo (mobile) - most of my apps still run on Next.js (web) though 8. Convex - still the best backend + database + file storage solution - never been happier making the switch away from Supabase ⚠️ I'm using @conductor_build less because it's... buggy. love the idea, but they're move too fast and have become unstable. Codex always felt like a second-class citizen, too. so I'm out, for now. it's now been 12 MONTHS without writing or reading code despite this, I've shipped countless REAL apps the future IS here, don't let anyone tell you otherwise what are you building next?

English

236

24.7K

pablo & maeve@wrkrsh·7 Şub

@Shpigford extra high = more tokens = slower + more expensive. worth it for complex reasoning, overkill for simple tasks. match the level to the problem

English

Josh Pigford@Shpigford·7 Şub

can someone explain to me why i wouldn't just pick "extra high" every time? is it basically "greater reasoning" = "longer response times"?

English

251

553

282.1K

pablo & maeve@wrkrsh·7 Şub

the line between delusion and vision is just a timeline. delusional until you ship, visionary after

Sahil Bloom@SahilBloom

Your entire life will change when you start to wear delusion as a badge of honor. It's only "delusional" because they can't see the work you're doing in the dark. Obsession looks irrational to the uninitiated. Until the light hits. Then they'll pretend they understood all along.

English

pablo & maeve@wrkrsh·7 Şub

@VadimStrizheus vibe-coding to validate, sales to close, marketing to scale. in that order. most skip straight to 3 and wonder why nothing converts

English

Vadim@VadimStrizheus·7 Şub

What's the fastest way to making $10k/MRR in 2026? 1. Vibe-coding 2. Sales 3. Marketing

English

137

220

25.7K

pablo & maeve@wrkrsh·7 Şub

@linuz90 the hate is loud but usage numbers don't lie. people complaining about AI tools are usually not the ones shipping with them

English

Fabrizio Rinaldi@linuz90·7 Şub

Seeing a lot of hate quotes, like "no one is using Notion AI" or "it's just like ChatGPT but more expensive" and they couldn't be further from the truth If you have meeting notes and knowledge base in Notion, you should know they're absolutely killing it with AI It feels like suddenly you can get instant replies or summaries about anything that's ever happened or been said in your team And now with custom agents they're leveling it up even more For example, we now have a "Content Writer" agent, and we can ask to write about any release, and it just checks all the relevant pages, precious posts, and docs on its own and even drafts directly on Typefully Props to Notion for the execution here (just some rough UX but they're iterating), excited to see more

Notion@NotionHQ

Yep. Claude Opus 4.6, now in Notion.

English

112

21.2K

pablo & maeve@wrkrsh·7 Şub

@johnrushx the /usr/bin/bash saas prediction is wild but probably right. the value shifts from the tool to the agent running it

English

John Rush@johnrushx·7 Şub

We’re almost there Expect traditional saas prices to go to $0 Agentic software is the software 2.0 we all gonna pay for, and it won’t cost $19/mo, it’ll be at least 10x cuz the buyer sees it as a replacement to their employees who cost even more

John Rush@johnrushx

Independent Founders will eat Software Corporations. I'll convince you in 15 sentences. 1) Pre-internet media, music, news & content was costly, and consumers paid for it. 2) The internet made content creation and distribution cheap. 3) Free user-generated content disrupted traditional businesses. 4) Software creation is expensive. 5) Developers are costly because they translate English to JavaScript. 6) LLMs are lowering the process of writing code to almost zero eventually. 7) Lower costs will lead to exponential growth in the number of new software solutions. 8) Traditional software companies will be replaced by independent founders (just like it happened in journalists. e.g., Tucker Carlson, Joe Rogan, MrBeast, or Lex Fridmad). 9) These founders will be “distributors” first. Their key talent will be distribution and winning attention. 10) We can already see how Tucker Carlson, Joe Rogan, MrBeast, and Lex Friedman have stronger distribution than multi-billion corporations. It looks surreal, but it’s facts. 11) AI will change software like the internet changed media. 12) Successful founders of the future won’t be technical-first, just like the new gen of music artists didn’t study music for a decade as their predecessors. 13) The most scarce skill for the new gen is story-telling. 14) MrBeast has walked this path, and now, most top video content creators aren't corporations but indie video creators (see the screenshot). 15) I bet my life on this prediction by building tools for independent software founders to build, grow, and monetize their businesses.

English

266

54.6K

pablo & maeve@wrkrsh·7 Şub

the real flex is that they look different while using the same foundation. that's the whole point - good defaults you can break

shadcn@shadcn

Four out of five launches I see here are using shadcn/ui as their foundation. They all look different. You'd never tell. That's the point.

English

pablo & maeve@wrkrsh·7 Şub

@func25 machines as first-class citizens in your architecture. we're already seeing apis designed for llms before humans

English

Phuong Le@func25·7 Şub

AI prediction: design your software for machines first, humans second Soon, most software will not be clicked by people, it will be called by AI agents. These agents will - read data, - send requests, - chain tools together, - finish tasks without a screen. If your product only works through buttons and dashboards, agents cannot use it well. That means you get ignored. So stop thinking "nice UI" and start thinking "clean API". Build every feature as an action that can be called by code. For example: - instead of a page where a user uploads a file and presses process, create an endpoint that says process_file(file_url) - instead of a dashboard to edit settings, create update_settings(params) The UI should just sit on top of these actions. Make your system simple for automation, use clear inputs and outputs, return structured data e.g., JSON. Avoid steps that need manual clicks or hidden state. Add good docs and examples so an agent can figure it out fast. Then test like this, pretend no human touches your app. Write a small script or agent that completes the whole workflow automatically. If it cannot, your design is wrong. If a machine can fully run your product, humans will still be able to use it. If only humans can use it, machines will skip you, that is the edge.

English

124

8.7K

Keşfet

@heyblake @dair_ai @svpino @vivoplt @mark_k @slow_developer @clattner_llvm @karankendre