Rob Phillips

1.7K posts

Rob Phillips banner
Rob Phillips

Rob Phillips

@iwasrobbed

@OathMed, @MoldableAI 🍎 Ex-VP for Siri cofounder, early Viv

Pale blue dot Katılım Ocak 2013
198 Takip Edilen1.7K Takipçiler
Sabitlenmiş Tweet
Rob Phillips
Rob Phillips@iwasrobbed·
As an ex-Viv (w/ Siri team) eng, let me help ease everyone's future trauma as well with the Fundamentals of Assisted Intelligence. Make no mistake, OpenAI is building a new kind of computer, beyond just an LLM for a middleware / frontend. Key parts they'll need to pull it off: Persistent User Preferences: - The biggest unlock of assistants has always been to deeply understand what someone wants in the most specific way. - This is the "wow" moment where computers stop being scary and start feeling truly helpful. - We did this in 2016 on Viv (youtu.be/Rblb3sptgpQ) when our AI knew what you liked for each and every service you used via Viv and mixed that in with context like what kind of flowers you told us your mom liked. - This will need to include access to your personal information to infer preference as well. External, Real-time Data: - 50% of the utility of an LLM comes from the base training and RLHF fine-tuning; but much more comes from extending its available data with external sources. - Zapier, Airbyte and others will help, but expect deep integration with 3rd party apps / data pipelines. - "Chat w/ PDF" is a tiny, tiny part of this. If you're only building that, think much bigger. Actual Computing on a Virtual Machines: - Context windows are limiting, so AI providers will continue benefiting from running tasks directly on a Python or Node/Deno virtual env so it can consume huge amounts of data just like a computer today can. - Today these are short-lived envs used by Data Analyst / Julius, but over time they'll become a new type of Dropbox where your data is persisted long term for additional processing or cross-file inference / insights. Agent Task / Flow Planning: - Planning can't function without intent. Understanding intent has always been a holy grail, and LLMs finally helped us unlock what we spent years approximating at Viv with NLP tricks. - Once intent is accurate, planning can start. Creating an agent planner is incredibly nuanced and will take significant integration with user preferences, 3rd party data sets, knowledge of compute capabilities, etc. - The bulk of the real magic of Viv was the dynamic planner / mixer that would pull all these data and APIs together and generate both a workflow AND dynamic UI on top of them for a normal consumer to execute. An App Store of Experts: - Apple initially made the mistake of building a closed app store; then realized they could monetize a cornucopia of creativity if they opened it. - Regardless of OpenAI saying they're focused on ChatGPT and only ChatGPT, it's inevitable they'll rescope it and enable a long tail of specialized assistants. - Builders will be able to compose multiple tools together into workflows that can specialize - And AIs over time will be able to auto-compose these tools together as well, learning from the builders that came before them. Persistent, Contextual Memory: - Embeddings are helpful, but they are missing fundamental parts like context switching, conversational centroids, summarization, enrichment, etc. - Most of the cost of LLMs today comes from prompts, but as history and persistence is embedded and the inference cached, this will unlock the ability to have long term memory with pointers to critical subjects, topics, feelings, tone, etc. - Core memory is just the beginning. We still need all the rich information our minds conjure when we think about a past sunset, a breakup, a scientific understanding, or sensitive context for people we interact with. Long Polling Tasks: - "Agent" is a loaded word, but part of the intent is to have tasks that can be scheduled and self-completing regardless of the time horizon required. - E.g. "Let me know when flights from Montréal to Hawaii are less than $500" - This will require coordination of compute across API providers, as well as virtual envs in the cloud. Dynamic UI: - Chat is not the final, end-all interface. There's a reason apps have affordances like buttons, date pickers, images. It simplifies, clarifies. - AI will be a copilot, but to be a copilot it'll need to adjust to what works best for a given user. The future is personalized as optimizations require it, so UI will be dynamic. API & Tool Composition: - Expect AIs to generate custom "apps" in the future where we can build our own workflows and compose together APIs, without waiting for a big startup to do so. - Fewer apps and startups will be needed to generate frontends, and AI will be better at composing an array of tools and APIs together coupled with a gas fee / tax. Assistant-to-Assistant Interaction: - There will be countless assistants in the future, with each assisting humans and other assistants towards some greater intent. - Alongside this, assistants will need to learn to interface across text, APIs, file systems, and other modalities used both by agents / startups and humans as integration flows deeper into our world. Plugin / Tool Stores: - Specialized assistants can only be made possible by composing tools, APIs, prompts, data, preferences, and much more. - The current plugin store is super early days, so expect much more work to come, and expect many of those plugins to be rolled in-house as they become more mission critical. And this is just a 10 minute brain dump; much, much more is needed behind the scenes including internet search and scraping, community (for intent, building, RLHF, etc), dynamic API generators and connectors, gas fees, tool builders, ingestion via glasses / earbuds / etc. If you think it's too late to be in AI, just know the above is about 25% of what it'll actually take, with much more to come as we iterate and get even more creative. We're in the early days of building parts of this at @FastlaneAI but with a different understanding: OpenAI will never be the best at everything. So we want to let you use the best AIs in the world, regardless of who builds them (that could be you!). Come join the fun!
YouTube video
YouTube
Prakash@8teAPi

I don’t know why there is any surprise. Here’s OpenAI’s product strategy for the next 2 years: - you will be able to upload anything to ChatGPT - you will be able to link any external service like Gmail, Slack - ChatGPT will have persistent memory, no more multiple chats unless you want it - ChatGPT will have a consistent, user customizable personality including political bias - ChatGPT will be able to respond by text, voice, images (diagrams and video still ?? In this timeframe) - ChatGPT will become much much faster until you feel it’s a real person (>50ms response time) - Hallucinations and non factual errors will decline rapidly - as self moderation improves, question rejection will decline

English
41
172
765
392K
Rob Phillips
Rob Phillips@iwasrobbed·
@nikitabier No incentives for humans to post on X right now Bots/slop get all the views by brute forcing the algo
English
0
0
0
8
Nikita Bier
Nikita Bier@nikitabier·
The fortress we are building—and the layers of redundancy—to protect the platform against the AI Slopacalypse will seem obvious in a few months. Whether we use every tool in our toolkit is TBD, but it would be negligent to not have them ready.
English
1.5K
438
8.8K
597.2K
Sebastian Caliri
Sebastian Caliri@SebastianCaliri·
Evidence-based medicine is a blessing of the 20th century. Evidence-based medicine is also a curse of the 20th century. Medical interventions are studied through randomized controlled trials and those interventions are assessed for efficacy and safety on a population level. But no individual quite matches some blended average of every trial participant. Rather, everyone's biology is unique. Sid Sijbrandij just presented the story of his cancer journey at OpenAI forum. When Sid ran out of evidence-based treatment options, he didn’t accept the boundary but rather began treating his cancer like an engineer: - multi-omic tumor profiling at extreme depth - N=1 drug development (vaccines, TCR-T, radioligand therapy) - parallel treatment strategies - continuous measurement (ctDNA, single-cell, immune state) and refinement Rather than protocol-based care, Sid built a learning loop. Maybe the future of medicine in a world where gathering and interpreting data gets cheaper and cheaper looks more like a loop. Thanks for sharing @sytses and @jacobjstern !
Sebastian Caliri tweet media
English
5
5
67
6.1K
Rob Phillips
Rob Phillips@iwasrobbed·
@sama Pretty sure most of us remember how difficult it was.
English
0
0
1
138
Sam Altman
Sam Altman@sama·
I have so much gratitude to people who wrote extremely complex software character-by-character. It already feels difficult to remember how much effort it really took. Thank you for getting us to this point.
English
4.6K
2.2K
36K
5.5M
Rob Phillips
Rob Phillips@iwasrobbed·
@thsottiaux Sandboxing & automation E.g. do less of sandboxing, more of automation (& every automation in worktrees fails due to sandboxes fwiw)
English
0
0
0
13
Tibo
Tibo@thsottiaux·
What are we consistently getting wrong with codex that you wish we would improve / fix?
English
1.2K
14
874
142.8K
Rob Phillips
Rob Phillips@iwasrobbed·
@emollick True for most posts as well, yet both comments + posts are still being rewarded by the algo "show me the incentives & i'll show you the outcome"
English
0
0
0
67
Ethan Mollick
Ethan Mollick@emollick·
I know I go on about this, but comments to all of my posts, both here and on LinkedIn, are no longer worth reading at all due to AI bots. That was not the case a few months ago. (Or rather, bad/crypto comments were obvious, but now it is only meaning-shaped attention vampires)
English
150
44
962
114.8K
Rob Phillips
Rob Phillips@iwasrobbed·
Gabe is right. We use (& grade) the latest & best models at @OathMed for this reason We made a point of evaluating models similar to but above and beyond how clinicians are evaluated: with board certification. So we confidently know which models are safe, and which aren’t (regardless of their AI lab builders stating they are) Historically, gating AI usage on workstations just caused clinicians to use unapproved “shadow AI” solutions on their phones instead Clinicians need relief, so instead of limiting them yet again, just use solutions that take the time to prove they’re safe for patients. Better yet, use solutions that prove they raise safety beyond even the average clinician
Gabe Wilson MD@Gabe__MD

This may be the most controversial thing I’ve posted. But I think it needs to be said. We are having an urgent conversation about slowing AI in medicine. Instituting rigorous safety measures. Thorough vetting before deployment. Many respected voices are calling for caution, and their instincts are grounded in a tradition of patient safety that I deeply respect. But I want to pose a question that I haven’t seen anyone ask. What is the cost of slowing down? Not the cost to technology companies. The cost to patients. We talk about AI safety as if the alternative is a well-functioning system. It isn’t. The current system produces error rates that have barely improved in decades. M&M cases that rarely lead to broad physician education. Community hospital physicians reliant on self-education of highly variable quality. Emergency physicians who never receive feedback on whether their practice patterns are calibrated — whether they order too many CTs or too few, admit too aggressively or too conservatively. Practice patterns that drift with fatigue across a single shift and across an entire career. These aren’t hypothetical harms. They’re the measured, documented, persistent background rate of medical error that we have normalized. Half of chronic disease medications aren’t taken correctly. Twenty percent of prescriptions are never filled. Up to half of adverse drug reactions are preventable. Thirty to eighty percent of hypertension patients discontinue treatment within the first year. This is the baseline. This is what we’re protecting when we slow AI deployment. Calculus measures continuous change. If we modeled the rate of improvement in patient care as a function over time, slowing AI adoption doesn’t just delay improvement by a fixed amount. It changes the integral. The cumulative patient harm prevented shrinks. Every month of delayed deployment represents ongoing harm from errors that a more capable system could have caught. We aren’t comparing AI with safety checks versus AI without safety checks. We’re comparing AI deployed in 2028 after rigorous vetting versus the current system continuing to produce the same error rates it has produced since 2000. The question is whether the cumulative harm from that delay exceeds the harm AI might introduce. I am not arguing against safety measures. I’m arguing that the cost of delay must be measured against a baseline that is far worse than most people acknowledge. We apply compassionate use and emergency authorization frameworks to drugs when the background mortality rate justifies accelerated deployment. We should at least ask whether AI in medicine has reached that threshold. The instinct to slow down feels responsible. But if slowing down means patients continue dying from errors that AI could prevent — errors we’ve failed to fix for decades through every other means — then the calculus of caution isn’t as simple as it appears. Sometimes the most dangerous thing you can do is nothing.

English
0
0
1
106
Rob Phillips
Rob Phillips@iwasrobbed·
We use (& grade) the latest & best models at @OathMed for this reason We made a point of evaluating models similar to but above and beyond how clinicians are evaluated: with board certification. So we actually confidently know which models are safe, and which aren’t (regardless of their AI lab builders stating they are)
English
0
0
0
3
Gabe Wilson MD
Gabe Wilson MD@Gabe__MD·
One of the most important things to understand about this debate: most physicians’ experience with “AI in medicine” is not with AI as it actually exists at the frontier. What most physicians interact with are domain-specific wrappers embedded in their EHR — ambient scribes, note summarizers, clinical decision tools. These products are almost always built on lesser models, often two or more generations behind the frontier, because the vendor needs to minimize token costs at scale. The economic incentive is to use the cheapest model that produces a passable result, not the most capable one available. When a physician says “I’ve tried AI and it isn’t that impressive,” they’re often evaluating a product running on a model that the AI community considers obsolete. It’s the equivalent of judging the potential of modern aviation based on a prop plane — while a supersonic jet exists in the same hangar. When we discuss AI capability in medicine — whether to slow it down, speed it up, or how to regulate it — we should be addressing the absolute latest state-of-the-art capability. Not the lowest common denominator that EHR vendors deploy to protect their margins. The gap between what physicians experience daily and what frontier models can actually do is the single largest source of miscalibration in this entire conversation. And it’s leading to policy conclusions based on outdated technology evaluations.
English
1
0
1
129
Gabe Wilson MD
Gabe Wilson MD@Gabe__MD·
This may be the most controversial thing I’ve posted. But I think it needs to be said. We are having an urgent conversation about slowing AI in medicine. Instituting rigorous safety measures. Thorough vetting before deployment. Many respected voices are calling for caution, and their instincts are grounded in a tradition of patient safety that I deeply respect. But I want to pose a question that I haven’t seen anyone ask. What is the cost of slowing down? Not the cost to technology companies. The cost to patients. We talk about AI safety as if the alternative is a well-functioning system. It isn’t. The current system produces error rates that have barely improved in decades. M&M cases that rarely lead to broad physician education. Community hospital physicians reliant on self-education of highly variable quality. Emergency physicians who never receive feedback on whether their practice patterns are calibrated — whether they order too many CTs or too few, admit too aggressively or too conservatively. Practice patterns that drift with fatigue across a single shift and across an entire career. These aren’t hypothetical harms. They’re the measured, documented, persistent background rate of medical error that we have normalized. Half of chronic disease medications aren’t taken correctly. Twenty percent of prescriptions are never filled. Up to half of adverse drug reactions are preventable. Thirty to eighty percent of hypertension patients discontinue treatment within the first year. This is the baseline. This is what we’re protecting when we slow AI deployment. Calculus measures continuous change. If we modeled the rate of improvement in patient care as a function over time, slowing AI adoption doesn’t just delay improvement by a fixed amount. It changes the integral. The cumulative patient harm prevented shrinks. Every month of delayed deployment represents ongoing harm from errors that a more capable system could have caught. We aren’t comparing AI with safety checks versus AI without safety checks. We’re comparing AI deployed in 2028 after rigorous vetting versus the current system continuing to produce the same error rates it has produced since 2000. The question is whether the cumulative harm from that delay exceeds the harm AI might introduce. I am not arguing against safety measures. I’m arguing that the cost of delay must be measured against a baseline that is far worse than most people acknowledge. We apply compassionate use and emergency authorization frameworks to drugs when the background mortality rate justifies accelerated deployment. We should at least ask whether AI in medicine has reached that threshold. The instinct to slow down feels responsible. But if slowing down means patients continue dying from errors that AI could prevent — errors we’ve failed to fix for decades through every other means — then the calculus of caution isn’t as simple as it appears. Sometimes the most dangerous thing you can do is nothing.
Gabe Wilson MD tweet media
English
5
4
9
17.7K
Rob Phillips
Rob Phillips@iwasrobbed·
AI empowered Paul Conyngham to create a custom mRNA vaccine to cure his dog's cancer when she had only months to live. The first personalized cancer vaccine designed for a dog... incredible. This has huge implications for humans as well, and the democratization of medicine.
Rob Phillips tweet media
English
2
0
1
79
Rob Phillips
Rob Phillips@iwasrobbed·
Ultimately: it depends. This is why we’ve been creating board certifications for AI @OathMed. We should hold models to the same standards (& many, many more) that clinicians are held to. If clinicians are one-shotting notes for visits, there may not be an opportunity for follow up questions to refine responses. A natural part of practicing medicine is working with imperfect data, finding ways to fill gaps in EHR data, patients not able to perfectly explain symptoms, lack of med adherence, etc, so harnesses and APIs need to eventually evolve to constrain and guide these discovery efforts a bit more instead of forcing an answer. Reasoning/thinking tokens isn’t sufficient, it’s sort of a hack. If patients/consumers are chatting with a model, I’d still err for “minimizing time to correctness” so it doesn’t give worse answers earlier in the hopes the user will stick around or eventually give more complete data. A lot of patients lack EHR data and a lot of EHRs lack patient data. All the while patients are just trying to hold onto hope that someone, anyone will figure out their problem(s) and won’t give up until they do. As with all UX: “build it, and they will misuse it” (until you rebuild it)
Karan Singhal@thekaransinghal

x.com/i/article/2032…

English
0
0
0
66
Rob Phillips
Rob Phillips@iwasrobbed·
Ultimately: it depends. This is why we’ve been creating board certifications for AI @OathMed. We should hold models to the same standards (& many, many more) that clinicians are held to. If clinicians are one-shotting notes for visits, there may not be an opportunity for follow up questions to refine responses. A natural part of practicing medicine is working with imperfect data, finding ways to fill gaps in EHR data, patients not able to perfectly explain symptoms, lack of med adherence, etc, so harnesses and APIs need to eventually evolve to constrain and guide these discovery efforts a bit more instead of forcing an answer. Reasoning/thinking tokens isn’t sufficient, it’s sort of a hack. If patients/consumers are chatting with a model, I’d still err for “minimizing time to correctness” so it doesn’t give worse answers earlier in the hopes the user will stick around or eventually give more complete data. A lot of patients lack EHR data and a lot of EHRs lack patient data. All the while patients are just trying to hold onto hope that someone, anyone will figure out their problem(s) and won’t give up until they do. As with all UX: “build it, and they will misuse it” (until you rebuild it)
English
0
0
0
95
Peer Richelsen
Peer Richelsen@peer_rich·
is everyone launching a "sub product" thats an AI app creator? 😆 glaze looks promising. is this gonna integrate into raycast?
Thomas Paul Mann@thomaspaulmann

Your computer, finally personal. Today we're launching Glaze, the second product in Raycast's history. It's a big moment for us, and I want to share the thinking behind it. Something is fundamentally changing about software. We see it every day inside our own team. People who never wrote a line of code are now contributing directly to our codebase. The barrier between "having an idea" and "making it real" is collapsing. And that changes everything. For six years, we've obsessed over what makes a great desktop app. The speed. The polish. The feeling of something that truly belongs on your computer. We've poured that into Raycast, and hundreds of thousands of people use it every day. But all that knowledge was locked inside our team. With Glaze, we're commoditizing it. Everything we've learned about building beautiful, capable desktop apps is now available to everyone. Tell Glaze what you want and it builds a real app that lives in your dock or taskbar. It launches instantly, works offline, and taps into the full power of your desktop. Beautiful by default and personal when you want it to be. It's fun for individuals and works just as well for teams. Our support team built a Glaze app connected to GitHub that runs their entire extension review workflow. Others have built dozens of internal tools. When you can shape software around how your team actually works, everything clicks. Here's what gets me most excited: we think Raycast becomes even more important in a world full of Glaze apps. Glaze apps will be deeply integrated with Raycast, connecting them all together in ways nobody else can do. The two products make each other better. A small team started building Glaze from scratch last summer. What they've shipped in that time still blows my mind. When we started Raycast, we set out to change how people use their computers. Glaze is the next chapter of that mission. We're opening the private beta today, March 4th. Mac only to start. Existing Raycast users will get priority access soon. We can't wait to see what you create and I’ll share some of my apps over the next couple of days. 💠

English
11
0
53
13.4K
Rob Phillips
Rob Phillips@iwasrobbed·
Everyone building exactly the same thing: Claw OS, kanbans, Claw for Blah, Claw + easier setup / hosting X timeline is like 1,000 screaming 🦞 in boiling water You’re experiencing personal software for the first time. You’re not a builder, you’re a customer You built something that worked for you and now you want to sell it to other people, but a big part of personal software is that we all build something that works for us individually, for free Why would I want to use what you built when I could just build it myself with a few magical words? Why would I want to pay to have my apps on your platform or to use your API keys when I can just get the wholesale price for everything now? “Oh, but my grandma would never build her own apps!” … this is gonna be no different than social media where a small minority builds a lot for the majority, and what doesn’t get built will be automated through agents So sure I’m proud of you for building a new thing using a few prompts and your Opus/Codex plan, but I hope you see soon that that’s the entire point: you’re a personal software user now. You just might not realize it yet You’re just early enough to remember what software used to be like to build and buy, whereas a lot of people will just will whatever they want into existence soon without even knowing your thing exists. They won’t need you. Why should they? The path to frictionless creation isn’t to put yet another paywall or “use my app!” plea for life in people’s path Think bigger, build bigger How could you give back a billion years of a happy, stable life to the working class? Do that.
English
0
2
11
9.6K
Andrew Ambrosino
Andrew Ambrosino@ajambrosino·
The Codex app is now live on Windows. The app runs both natively and in WSL, with integrated terminals for PowerShell, Command Prompt, Git Bash, or WSL. We also built the first Windows-native agent sandbox — using OS-level controls to block filesystem writes outside your working folder and prevent outbound network access unless you explicitly approve it. Plus: 7 new “Open in …” apps and 2 new Windows skills (WinUI + ASP.NET). Try it and tell us what you think.
Andrew Ambrosino tweet media
English
142
153
1.7K
580.8K
Raycast
Raycast@raycast·
Today we're launching Glaze 💠 Create any desktop app in minutes by chatting with AI. Beautiful, powerful, and truly personal. Learn more on glazeapp.com Follow @glazeapp for updates.
English
288
565
6.8K
2.3M
Rob Phillips
Rob Phillips@iwasrobbed·
@stanine Very backwards. Eng is pretty easy these days, product getting easier, sales still quite the challenge...
English
0
0
0
154
Matt MacInnis
Matt MacInnis@stanine·
If you're at a tech company whose CEO is a sales-type and not a product- or engineering-type, you should bail.
English
59
18
527
95.1K
Rob Phillips
Rob Phillips@iwasrobbed·
@munawwarfiroz This. Crazy how it can simultaneously understand and misunderstand your codebase
English
0
0
0
9
Munawwar Firoz
Munawwar Firoz@munawwarfiroz·
@zeeg Also the backward compatibly code for the very code it wrote 5 minutes ago
English
6
0
94
2K
David Cramer
David Cramer@zeeg·
No Codex, I do not need fallbacks for every path of code!
English
76
18
567
54.2K