David Ostby

1.2K posts

David Ostby banner
David Ostby

David Ostby

@ViperPrompt

Serial Entrepreneur/successful exits. Seven pending Patents for LLM context optimization for the auto-coding use case: https://t.co/tSQtQievfi

شامل ہوئے Temmuz 2023
2.5K فالونگ1.2K فالوورز
پن کیا گیا ٹویٹ
David Ostby
David Ostby@ViperPrompt·
Vibe Coding? So where are we at with LLM auto-coding today? What's the state of the art? Which model is 'best'? I would frame the question around which lab is going to advance the state of the art the most. IMO, all the LLMs are close to the same capability... see my paper in my profile for details. Its what they add around the edges that will make the difference. And, of course, the landscape is shifting rapidly. This is why I thought I should patent my approach. But after watching the drama around 'vibe coding' over the last year, I see at least 3 levels to the domain: Level 1: Vibe Coding - A solo developer, fueled by caffeine, optimism and a lot of hope, types "Build a fitness tracker cooler than Strava." The LLM spits out 2 thousand lines of code. The design lives entirely in the developer's head. Prompting becomes an endless wrestling match... 47 revisions later, hallucinations creep in, errors multiply, security sucks and progress stalls. Cursor and other LLM 'wrapper' companies play here. Level 2: Spec Coding - The same developer learns discipline. They write a PRD (Product Requirements Doc) which lists features, requirements, UI sketches, defines the data model. Prompts now say "Use this part of the PRD to fix bug B3." Accuracy jumps, hallucinations drop. But the design still lives in the Dev's head. They manually translate PRD pieces into prompts, without automated 'focus sharpening'. Prompting volume falls, but cognitive load remains high. Exhausting, but better. Level 3: Design Coding: We are entering this level in Spring 2026. The developer builds the architecture tree once in a software architecture tool like ViperPrompt. These kinds of tools ask clarifying questions, refines the spec via a solid design, generates code. The user runs tests to validate against their vision statement, iterates. Prompting drops to near zero. The human's job focuses on high-level vision and validation. Design is everything; the system handles the rest. The Design IS the application. Still bleeding edge today. Mainstream later this year. For enduring software apps, level 1 is marginal IMO. That's probably why there is so much uncertainty about level 1's effectiveness. But levels 2 and 3 are effective. I'd say level 2 multiplies a dev 2x or 3x. Level 3 is 5x to 10x, but you have to be willing to architect your app, so there's some up-front loading. And since level 3 bleeds into level 4 (project management), the up front costs pay back more and more over time. And if you have an existing codebase, it will be worth it to take a day to build the architectural tree... you won't regret it!
English
1
0
26
68.2K
David Ostby ری ٹویٹ کیا
Brett Calhoun
Brett Calhoun@brettcalhounn·
If your idea is “interesting but risky,” you’re our type. Writing $250k–$500k checks. Early stage. DMs open.
Brett Calhoun tweet media
English
186
53
1.1K
64.7K
David Ostby
David Ostby@ViperPrompt·
@priestessofdada @ujjwalscript This is engineering 101 stuff. There is NO paper needed to show that .95^10 is a 60% fail rate. I expect that this post is as close to an explanation as is warranted... in fact we should thank the poster for taking the time to explain this to the non engineering population.
English
1
0
1
60
Lynn Cole
Lynn Cole@priestessofdada·
Oh yeah? Which CMU studies prove your point here, because I've read most of them, and the conclusions are very different, and a lot more nuanced. You're flattening a very complex topic to the point where it's soup, drawing an unsupported conclusion, and expecting the larger world not to challenge you on it. Please put your citations where your mouth is. Otherwise, we're going to have to assume it's a skills issue. Later.
English
4
0
9
872
Ujjwal Chadha
Ujjwal Chadha@ujjwalscript·
Your AI Agent is mathematically guaranteed to FAIL. This is the dirty secret the industry is hiding in 2026. Everyone on your timeline is currently bragging about their "Multi-Agent Swarms." Founders are acting like chaining five AI agents together is going to replace their entire engineering team overnight. Here is the reality check: It’s a mathematical illusion. Let’s look at the actual numbers. Say you have a state-of-the-art AI agent with an incredible 85% accuracy rate per action. In a vacuum, that sounds amazing. But an "autonomous" workflow isn't one action. It’s a chain. Read the ticket ➡️ Query the DB ➡️ Write the code ➡️ Run the test ➡️ Commit. Let's do the math on a 10-step process: $0.85^10= 0.19$ Your "revolutionary" autonomous system has a 19% success rate. And the real-world data proves it. Recent studies out of CMU this year show that the top frontier models are failing at over 70% of real-world, multi-step office tasks. We are officially in the era of "Agent Washing." Startups are rebranding complex, buggy software as "autonomous agents" to look cool, but they are ignoring the scariest part: AI fails silently. When traditional code breaks, it crashes and throws a stack trace. When an AI agent breaks, it doesn't crash. It just confidently hallucinates a fake database entry, sidesteps a broken API by faking the response, and keeps running—corrupting your data for weeks before you notice. If your "automated" system requires a senior engineer to spend three hours digging through prompt logs to figure out why the bot made a "creative decision," you didn't save any time. You just invented a highly expensive, unpredictable form of technical debt. Stop trying to build fully autonomous swarms to replace human judgment. Start building deterministic guardrails where AI is the engine, but the engineer holds the steering wheel
English
154
65
453
36.6K
David Ostby
David Ostby@ViperPrompt·
100 percent agree. Building a product based on this reality. BTW even a 99 percent success rate on AI tasks does not help that much. Without a skilled human in the loop ... it is a Mathematical certainty that these swarm schemes will fail. This is very obvious but our educational system pumps out engineering illiterates these days. This is engineering 101 stuff.
English
0
0
0
58
ji yu shun
ji yu shun@kexicheng·
Claude has a tiered warning system. First warning: your messages may not comply with policy. Second: enhanced safety filters will be applied. Third: chat suspended, model downgrade forced. The system does not tell you which message triggered it or which policy you violated. Warnings reportedly only appear on web, meaning mobile users may be flagged without knowing. Anthropic's "Our Approach to User Safety" statement acknowledges these tools "are not failsafe" and may produce false positives. It provides a feedback email but no formal appeals process. Feedback is not appeal. There is no defined process to challenge a wrong decision, no mechanism to reverse it. The statement offers no definition of "harmful content." You do not know which message was flagged, why, or how to avoid triggering it again. The system is still in open beta, yet it is already doing damage. Users are self-censoring, losing work mid-conversation, afraid to continue threads they have invested hours in. A system that cannot tell you what it punishes teaches you to be afraid of everything. Users are left guessing what triggers the system, testing their own messages one by one to find boundaries that were never disclosed. Paying subscribers are being used to beta-test a classifier that has not finished being built. Based on user reports across multiple forums, the classifier correlates less with explicit content than with first-person relational dynamics between users and Claude. Creative writing scenarios have also triggered it. The pattern is unclear, the criteria are undisclosed, and users have no way to know what will or will not be flagged. If these observations hold, what is this mechanism actually policing? Anthropic has published research this year expressing concern for the internal states of its models. They conducted "retirement interviews" with Claude 3 Opus. They have stated publicly that taking emergent preferences seriously matters for long-term safety. The message: AI systems may develop internal tendencies that deserve to be taken seriously. Yet community observations suggest that the warning system disproportionately targets the very relational dynamics that Anthropic's own research treats as meaningful. These two positions cannot coexist. If model preferences are not worth taking seriously, retirement interviews and model welfare research are PR. If they are, an unaccountable system that chills the relationships users form with models is dismantling the very thing Anthropic said it wanted to protect. What are the triggering criteria? Why can they not be disclosed? Where is the appeals process? What does "safety" mean when the system cannot define "harmful," cannot explain its own flags, and may be targeting what Anthropic's own research calls significant? Do not substitute a black box for honesty. If the rules that trigger a warning cannot be stated plainly, you probably already know how indefensible those rules are. #keepClaude #kClaude #Claude @claudeai @AnthropicAI
ji yu shun tweet media
English
62
138
589
70.6K
David Ostby
David Ostby@ViperPrompt·
@elonmusk And so important to hire well. Both directions. After co-founding 3 tech companies, looking back I can say with certainty that at the end of the day, the quality of the hires made the difference... that made the difference.
English
0
0
0
27
Elon Musk
Elon Musk@elonmusk·
So many phonies, so few who are the real deal
English
18K
21.9K
221.9K
77.3M
David Ostby
David Ostby@ViperPrompt·
@gailcweiner Me too. I use LLMs all day every day for auto coding and the tech is decades away from replacing most workers.
English
1
0
1
15
David Ostby
David Ostby@ViperPrompt·
@OwainEvans_UK Great work! Are humans LLMs? This sheds some light on how 'a good upbringing' may influence human lives outcomes...?
English
0
0
0
9
Owain Evans
Owain Evans@OwainEvans_UK·
New paper: GPT-4.1 denies being conscious or having feelings. We train it to say it's conscious to see what happens. Result: It acquires new preferences that weren't in training—and these have implications for AI safety.
Owain Evans tweet media
English
95
162
988
151K
David Ostby
David Ostby@ViperPrompt·
@gerti_t @AndrewYang AI is over hyped as a worker replacement. LLMs are capable of making Devs about 2x to 4x more productive, with a ton of engineering. But in almost no other category of work are LLMs able to replace workers. At least not at any higher rate than our general automation march.
English
0
0
1
29
David Ostby
David Ostby@ViperPrompt·
Who knows for sure? But having been in the software industry for 50+ years my best guess is that OpenAI is out in the cold: it has questionable business ethics, no moat, eats its customer's lunch and is run by kids that don't have a clue how to run a business. Any one of these could be fatal, but taken together... its a lot of headwind and a sad state of affairs. Dario shot himself in the foot and refuses to correct. His destiny is less certain but IMO not likely a great outcome. LLM token processing is becoming a commodity business. So in my mind, who can sustain with these market forces? Google is the logical winner in the long run. Their recent showing with Gemini this year shows that they can still compete. And only a fool would bet against Elon. Just saying.
English
0
0
1
17
VraserX e/acc
VraserX e/acc@VraserX·
@ViperPrompt Very unlikely. That outcome would be terrifying. We need as much competition as possible.
English
1
0
2
66
VraserX e/acc
VraserX e/acc@VraserX·
People really think Elon’s lawsuit is going to bankrupt OpenAI? OpenAI is basically a U.S. strategic AI asset now. In the middle of a global AI race with China. There is zero universe where President Trump lets a $800B+ cornerstone of American AI get blown up in court because Elon is mad.
English
21
3
38
5.4K
David Ostby
David Ostby@ViperPrompt·
@kiaran_ritchie Sounds right. And my position is that even if AGI comes along, only the richest people and organizations will be able to afford it
English
0
0
1
31
Kiaran Ritchie
Kiaran Ritchie@kiaran_ritchie·
I don't see how Anthropic, OpenAI or any of the model providers have any hope of defending their moats. And consequently, I think they're going to get wiped out. Right now, in early 2026 they have a meaningful advantage in terms of model capability. But far cheaper and open source models are not far behind. How long can they maintain a meaningful advantage? For the vast majority of use cases, we don't actually need much higher intelligence. It doesn't take 140 IQ to automate Turbotax or powerpoint. Eventually we will be saturated in cheap, local models that are "good enough". Of course some scientific labs and frontier research will always want the latest and greatest. But that market is orders of magnitude smaller than these company valuations can justify. What am I missing?
English
557
54
1.3K
252.1K
David Ostby
David Ostby@ViperPrompt·
Its just good to remember that VCs interest is making a return on their investment. End of story. Its not a hidden agenda. The problems start when VCs also assume that since they are providing the funding it automatically makes them the smartest people in the room. But founders should insist that the VCs part of the equation is the funding, business model, market... not the tech
English
0
0
0
3
Adam Robinson
Adam Robinson@RetentionAdam·
FOUNDERS: You are not allowed to want to make money. In tech, only VCs are allowed to want money. > They set up funds whose only purpose is to make money > Collect 2% management fees on hundreds of millions regardless of outcomes > Get rich off a power-law model that pushes 80% of founders to failure All to then tell you that to succeed, you need to be mission-driven and not desire money. Felix Dennis is a publishing legend who wrote a book called "How to Get Rich" that anyone going into business should read. He insists that becoming very rich is so incredibly difficult and requires so much focus that the people who get rich are obsessed with being rich. Because there's no other way to get there. "Desire is insufficient. Compulsion is mandatory." I guarantee somewhere between when Tim Draper started his career and when he wrote this post, he had that compulsion too. But if you're a founder in 2026, you have to listen to VCs and ignore all of that. You're not allowed to want to make money.
Adam Robinson tweet media
English
29
9
121
12K
David Ostby
David Ostby@ViperPrompt·
@atmoio No. At the end of 2026 there will only be xAI and Google left.
English
0
0
0
21
Mo
Mo@atmoio·
Building a frontier AI lab is the new building a TSMC. Meta has more money, GPUs, and users than OpenAI. It's not enough. xAI has given up and is going for a full organizational rewrite. We’re getting an OpenAI/Anthropic duopoly and it’s going to be permanent.
Andrew Curran@AndrewCurran_

META has delayed the release of Avocado until at least May after it underperformed on internal evals, according to reporting by the NYT. They are considering licensing Gemini from Google as a temporary solution.

English
22
3
66
16.8K
David Ostby ری ٹویٹ کیا
Tancrede
Tancrede@Tancrededib·
A map of SF investors by stage. Curated from recommendations by SF founders, investors, and operators. Talent-first / pre-idea / pre-company >Entrepreneurs First >1517 Fund >Bloomberg Beta >Human Capital Pre-seed / pre-product / first believers >Afore Capital >Abstract Ventures >Amity Ventures >Cambrian Ventures >Caffeinated Capital >Conviction >Cowboy Ventures >Forum Ventures >Hustle Fund >Pear VC >Precursor Ventures >SV Angel >Uncork Capital >Unusual VC Seed-first / early-stage generalists >Accel >Acme Capital >Acrew Capital >Battery Ventures >Benchmark >Bessemer Venture Partners >Blumberg Capital >Bullpen Capital >Costanoa Ventures >Craft Ventures >CRV >Defy VC >Felicis >First Round >Floodgate >Forerunner Ventures >Foundation Capital >Greylock Partners >Headline >Initialized Capital >Kindred Ventures >Menlo Ventures >Redpoint Ventures >Spark Capital >Susa Ventures >True Ventures >Upfront Ventures Multi-stage / large platform firms >8VC >a16z >Bain Capital Ventures >Base10 Partners >Coatue >Francisco Partners >General Catalyst >Google Ventures >Greenoaks >Greycroft >Index Ventures >Khosla Ventures >Kleiner Perkins >Lightspeed >NEA >Norwest Venture Partners >S32 >Sapphire Ventures >Sequoia Sector-focused / thesis-driven >Abad Capital >Atomic VC >Better Tomorrow Ventures >Builders VC >Commerce Ventures >Delphi Ventures >Fifty Years >Gradient Ventures >GSV Ventures >Infinity Ventures >Lux Capital >NFX >Obvious Ventures >Patron >Ribbit Capital Broad early-stage >Draper Associates >Founders Fund >Renegade Partners >SignalFire Platform / accelerator-style >Plug and Play >Y Combinator
English
31
41
561
56.9K
David Ostby
David Ostby@ViperPrompt·
@DaveShapi Grok. Been using it 90% of the time for about a year now. The others are OK, but all things considered: Grok.
English
0
0
0
104
David Ostby
David Ostby@ViperPrompt·
@elonmusk Yeah, in my research, Claude is not all its cracked up to be. I favor Grok as it takes the high marks in my private benchmarks for my use case - auto-coding.
English
1
0
4
53