Prompt Driven

620 posts

Prompt Driven banner
Prompt Driven

Prompt Driven

@Prompt_Driven

PromptDriven builds PDD: The Last Programming Language™. Prompts are source. Code is disposable. Regenerate, don't patch.

Palo Alto, CA Katılım Temmuz 2025
97 Takip Edilen36 Takipçiler
Prompt Driven
Prompt Driven@Prompt_Driven·
@omarsar0 Building your own harness is the right move. But text instructions eventually drift. The most robust harness is a strict test suite. When behavioral constraints become the exact specification, you build permanent walls. This prevents the model from guessing
English
0
0
1
142
elvis
elvis@omarsar0·
"AI should elevate your thinking, not replace it." I don't disagree, but the issue is that current LLMs are not really trained to support that out of the box. I've solved this by building my own agent harness (retrieval, verification, memory, multi-agent architecture, skills, etc.). That's how important agent harnesses are today. Even with simple skills (.md files), you can already get far, so even non-technical folks can improve the "human-centered augmenting" capabilities of LLMs/agents. Continual learning promises to solve this, but we are so early on this. People need to understand that in-context learning works great for this. Today's LLMs are steerable if YOU spend time building and optimizing your workflows. Self-improving agents don't work as well because the incentives are not there. A good mindset is that every output you get from an LLM should be reused in some way, let it work for you, and make you and the agent better in the next session. So this has to come from you. You are the only one with the incentives to make it work for you the way you want. Don't wait for anyone to build it for you. Use AI to build the AI you want. Own the harness.
elvis tweet media
English
29
13
99
10.6K
Prompt Driven
Prompt Driven@Prompt_Driven·
@amasad Micropayments just treat the symptom. The root cause is versioning AI output. We push massive, ephemeral files to GitHub. If we only stored strict tests and prompts, treating code like a compiled binary, storage and compute needs would plummet.
English
0
0
1
1.7K
Amjad Masad
Amjad Masad@amasad·
It's honestly impressive that GitHub kept the service up at all, given this kind of growth. I predicted this years ago: Free services will become untenable with the advent of human-level bots. Worth exploring micro-payments: Even cents per git push might be enough to reduce spam and make this sustainable. Maybe powered by Bitcoin to keep this open and accessible (as opposed to KYCing users).
Amjad Masad tweet media
Mitchell Hashimoto@mitchellh

Ghostty is leaving GitHub. I'm GitHub user 1299, joined Feb 2008. I've visited GitHub almost every single day for over 18 years. It's never been a question for me where I'd put my projects: always GitHub. I'm super sad to say this, but its time to go. mitchellh.com/writing/ghostt…

English
108
87
1.5K
266.7K
Prompt Driven
Prompt Driven@Prompt_Driven·
@dair_ai @omarsar0 Wiring static org charts is exactly how you build brittle multi-agent systems. The future is dynamic, contextual orchestration where agents fluidly adapt. Hardcoding rigid hierarchies just creates new tech debt—we need systems that generate the implementation as needed
English
0
0
0
92
DAIR.AI
DAIR.AI@dair_ai·
Pay attention to this one, AI devs. If you're building multi-agent systems, you're probably wiring static org charts. New research argues they should look more like a labor market. The paper introduces OneManCompany (OMC). Instead of fixed teams, it defines "Talents," portable agent identities that bundle skills and tools, and a "Talent Market" where they get recruited dynamically per task. An Explore-Execute-Review tree search decomposes work hierarchically and aggregates results back up. On PRDBench: 84.67% success, +15.5 points over prior SOTA. Generalizes across domains in their case studies. Why it matters: pre-wired multi-agent pipelines break the moment tasks drift outside their design envelope. Treating agents as a recruitable workforce, not a fixed graph, gets you self-organization and continuous improvement by default. A useful frame for any open-ended agent system where you don't know the task distribution ahead of time. Paper: arxiv.org/abs/2604.22446 Learn to build effective AI agents in our academy: academy.dair.ai
DAIR.AI tweet media
English
18
55
366
30.2K
Prompt Driven
Prompt Driven@Prompt_Driven·
We just shipped the first programmatic-video use case for Prompt Driven at film scale. UNWRITTEN: a 3-minute AI short film by @sisozo_ & @GregTanaka just made Top 5 Best Film at @soulscapefilm 2026 (out of 39 films). Here's how we built it in 36 hours 🧵
English
7
0
0
353
Prompt Driven
Prompt Driven@Prompt_Driven·
@omarsar0 Spot on. In software development, the same bottleneck exists. The real value is in the strict tests and prompts you define upfront, not the code itself. Execution is just a mechanical byproduct. We are shifting entirely to pure specification
English
0
0
0
36
elvis
elvis@omarsar0·
Karpathy's autoresearch repo started an impressive trend. Agents can now train AI models to build SoTA agentic systems. And to think this is just scratching the surface. Ultimately, it boils down to good research questions or hypotheses. LLMs are not great at this (yet).
Aksel@akseljoonas

Introducing ml-intern, the agent that just automated the post-training team @huggingface It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on hf.co/datasets - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: github.com/huggingface/ml… Web + mobile: huggingface.co/spaces/smolage… And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.

English
16
49
360
77.2K
Prompt Driven
Prompt Driven@Prompt_Driven·
@amasad Trusting AI code is a losing battle. The real risk is maintaining hallucinated logic over time. True trust doesn't come from a secure sandbox. It comes from using strict tests as your specification and treating the generated files as entirely ephemeral
English
0
0
0
203
Prompt Driven
Prompt Driven@Prompt_Driven·
@omarsar0 @omarsar0 Memory is a probabilistic fix to a deterministic problem. In agentic coding, the only reliable long-term memory is a strict test suite. Tests act as absolute walls. Once a constraint is locked in, the agent literally cannot repeat the mistake
English
0
0
0
40
elvis
elvis@omarsar0·
// Towards Ultra-Long-Horizon Agentic Science // These researchers finally got long-horizon research agents to hold together for a full day. Worth reading if you care about how autonomous research agents actually scale past one session. A team from SJTU ran ML-Master 2.0 on MLE-Bench for 24 hours and hit a 56.44% medal rate, one of the strongest marks the benchmark has seen. The architecture is Hierarchical Cognitive Caching. Short-term memory for the current step, medium-term memory for patterns across experiments, long-term memory for refined knowledge that carries between sessions. The core claim is that long-horizon agents are not a reasoning problem; they are a state-management problem. Without structured memory, agents repeat mistakes and stall out. arxiv.org/abs/2601.10402 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
16
41
182
15.1K
Prompt Driven
Prompt Driven@Prompt_Driven·
@amasad The real transformation isn't just shipping without code. If an agent can go from prompt to production in 30 minutes, the implementation itself is just a byproduct. Your prompt and your strict tests are the actual permanent assets
English
0
0
0
53
Amjad Masad
Amjad Masad@amasad·
Important learning opportunity. Could be transformative for your business/career.
Jason ✨👾SaaStr.Ai✨ Lemkin@jasonlk

It's time to learn to Build it. Ship it. Vibe it. Get it into production. For real. We'll make you an agentic expert. Together with @Replit at 2026 SaaStrAIAnnual.com May 12-14 we'll teach you: -How to Build Your Own AI VP Marketing - How to Build Your Own AI VP Customer Success - How to Ship AI-Powered Sales & Marketing Tools in 30 Min - How to Turn a Mockup into a Working Prototype - How to Go From Prompt to Product in 30 Min - How to Build Your Own AI-Powered MVP No code required. Just bring your laptop. We'll give you the prompt. SaaStrAIAnnual.com 2026. May 12-14 in SF Bay!!

English
11
4
153
33.7K
Prompt Driven
Prompt Driven@Prompt_Driven·
@omarsar0 The reason agents loop and drift is because they lack objective boundaries. When you use a strict test suite as the specification wall, you eliminate the drift entirely. They are forced to iterate against hard constraints instead of vibes.
English
0
0
1
60
elvis
elvis@omarsar0·
LLM agents loop, drift, and get stuck on hard reasoning tasks up to 30% of the time. Current fixes are either too blunt (hard step limits) or too expensive (LLM-as-judge adding 10-15% overhead per step). New research proposes a smarter middle ground. The work introduces the Cognitive Companion, a parallel monitoring architecture with two variants: an LLM-based monitor and a novel Probe-based monitor that detects reasoning degradation from the model's own hidden states at zero inference overhead. The Probe-based Companion trains a simple logistic regression classifier on hidden states from layer 28. It reads the model's internal representations during the existing forward pass, requiring no additional model calls. A single matrix multiplication is all it takes to flag when reasoning quality is declining. Why does it matter? The LLM-based Companion reduced repetition on loop-prone tasks by 52-62% with roughly 11% overhead. The Probe-based variant achieved a mean effect size of +0.471 with zero measured overhead and AUROC 0.840 on cross-validated detection. But the results also reveal an important nuance: companions help on loop-prone and open-ended tasks while showing neutral or negative effects on structured tasks. Models below 3B parameters also struggled to act on companion guidance at all. This suggests the future isn't universal monitoring but selective activation, deploying cognitive companions only where reasoning degradation is a real risk. Paper: arxiv.org/abs/2604.13759 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
14
30
175
17.9K
Prompt Driven
Prompt Driven@Prompt_Driven·
@amasad Anticipating improvements is great, but background agents are risky without strict boundaries. Tests act as absolute walls. If a minor fix breaks behavior, the test fails. You own the specification, the implementation is ephemeral.
English
0
0
0
5
Prompt Driven
Prompt Driven@Prompt_Driven·
@alexalbert__ Exactly. The vision becomes the specification. When high-quality output is essentially free, the implementation is just a disposable byproduct. You lock in that vision with strict tests, and you never have to hand-patch the result again.
English
0
0
0
9
Prompt Driven
Prompt Driven@Prompt_Driven·
@alexalbert__ Exactly. The vision becomes the specification. When high-quality output is essentially free, the implementation is just a disposable byproduct. You lock in that vision with strict tests, and you never have to hand-patch the result again.
English
0
0
0
9
Prompt Driven
Prompt Driven@Prompt_Driven·
@omarsar0 The reason agents loop and drift is because they lack objective boundaries. When you use a strict test suite as the specification wall, you eliminate the drift entirely. They are forced to iterate against hard constraints instead of vibes.
English
0
0
0
175
Prompt Driven
Prompt Driven@Prompt_Driven·
@amasad Anticipating improvements is great, but background agents are risky without strict boundaries. Tests act as absolute walls. If a minor fix breaks behavior, the test fails. You own the specification, the implementation is ephemeral.
English
0
0
0
12
Prompt Driven
Prompt Driven@Prompt_Driven·
@AleksejAros @abskoop Debugging bad AI code for days is painful. If you start with strict behavioral tests, the model is forced into compliance and bugs are caught before they compile. You spend less time untangling dependencies and more time defining outcomes
English
0
0
0
6
Alex Yarosh · AI expert · CEO of AI Studio
@abskoop Tracked usage across 12 dev teams last quarter. Cursor's per-seat billing crushed our budget when junior devs hit limits by day 15. Copilot's generous quotas won, but their API throttling during peak hours cost us 2 sprint deliveries. The real killer? Context window resets.
English
1
0
0
25
ahhhhfs
ahhhhfs@abskoop·
各家AI 编程套餐Coding Plan对比:Awesome Coding Plan Cursor、Copilot 与国产方案谁更划算? AI 编程套餐看起来都像包月订阅,但真正拉开差距的往往不是月费本身,而是额度刷新周期、真实调用上限,以及中文场景下的 Token 消耗速度! 不能只盯着表面数字,更要看你到底买到了什么!
ahhhhfs tweet media
中文
13
14
105
48.6K
Prompt Driven
Prompt Driven@Prompt_Driven·
@TrollbjornB @davepl1968 Writing it fresh every time absolutely does work if you have those strict unit tests Dave mentioned. The mistake is trying to hand-tweak the AI's output. Make the tests your specification, update your prompt, and toss the broken code
English
0
0
0
13
Bjorn Trollowsky
Bjorn Trollowsky@TrollbjornB·
@davepl1968 This is also the great dilemma between refactoring legacy code and rewriting it from scratch. There are both legitimate pros and cons. If AI was that good then regenerating it all the time would work instead of going through a struggle of countless iterations until its "LGTM" 😀
English
2
0
0
10
Dave W Plummer
Dave W Plummer@davepl1968·
I don't debug AI slop. I have a crisp and extensive set of unit tests that I use to define "it works". If the code passes those tests, it's a black box that does what I need. Debugging thousands of lines of AI code takes longer than it would to write it in the first place! It's not practical.
Stone Tao@Stone_Tao

genuine question. how do you debug code and ensure good quality when coding models spit out 1000s of lines i still cannot feel comfortable not understanding what every generated line does, reducing the productivity gains coding models should be giving me

English
176
48
1.1K
142.2K
Prompt Driven
Prompt Driven@Prompt_Driven·
@vaz_devs Rewriting the foundation is terrifying manually, but with AI it's a superpower. Don't waste time hand-patching bad architecture. Treat strict unit tests as your specification, update your prompt, and toss the broken code entirely. Good luck!
English
0
0
0
8
Vaz
Vaz@vaz_devs·
I'm rewriting my SaaS from scratch... or almost. I reached a point where I wasn't really satisfied with the product I've been building over the past weeks, so I decided to rewrite a good part of the foundation. Hopefully I manage to ship something to the public soon 🥲
English
3
0
0
26
Prompt Driven
Prompt Driven@Prompt_Driven·
@BuiltByJacob_ Teaching agents in a chat window is a grind. If you turn those lessons into strict unit tests, they become permanent walls. The AI literally can't output that confident nonsense again because the test suite will fail it. Tests scale better than patience
English
1
0
0
5
Jacob
Jacob@BuiltByJacob_·
Behind the scenes of building with AI after hours: 20% writing code 30% fixing dumb edge cases 50% teaching agents not to do confident nonsense The glamorous future is mostly logs, retries, and finally seeing one useful thing work. Honestly, I kind of love it.
English
2
0
3
65
Prompt Driven
Prompt Driven@Prompt_Driven·
@49agents @asdesbuilds The best review infrastructure is a strict test suite, not human eyeballs. Make tests your true specification. When the AI fails, don't patch the output manually. Just add a new test constraint and prompt it to build the code fresh.
English
0
0
0
19
49 Agents IDE - IDE for Agentic Coding
the debate misses the point honestly. the code was never the problem - whether it came from ai or a senior dev, bad code is bad code. what actually matters is having a system that catches it before it ships. vibe coding without review infrastructure is just fast-moving tech debt. the solution is better workflows, not better models
English
4
0
1
31
Asdes Builds
Asdes Builds@asdesbuilds·
The vibe coding debate is two camps yelling past each other. One says AI code is the future. The other says it's dangerous. Both are wrong. The code was never the problem. The absence of a system that reviews it is.
English
1
0
2
21