
Akshay Ramaswamy
5.1K posts

Akshay Ramaswamy
@TheRealAk914
Staff PM @elise_ai | Formerly CEO @ Omni, acquired by @Coinbase | @Stanford and @ycombinator alum | #blacklivesmatter 🦇🔊


OpenAI's new image model GPT-Image-2 has leaked It seems to have extremely good world knowledge and great text rendering Possibly better than Nano Banana Pro It's on @arena under code names: - maskingtape-alpha - gaffertape-alpha - packingtape-alpha

🚨 GREG BROCKMAN JUST EXPLAINED THE NEXT LEAP WITH SPUD (GPT 5.5) Greg Brockman: "I think of Spud as a new base, as a new pre-train... I'd say it's like we have maybe two years worth of research that is coming to fruition in this model." Greg says: "There's this thing called 'big model smell'... when these models are just actually much smarter, much more capable, that they bend to you much more, and you feel it." Here is exactly what we are getting with the upcoming GPT 5.5 rollout: • "Big Model Smell": A massive qualitative shift. The models stop being rigid and start intuitively bending to what you actually want them to do. • Unlocking New Abilities: It can just do things it wasn’t able to before. The frustrating moments where the AI "doesn't quite get it" and needs you to over-explain are going away. • Longer Time Horizons: The ceiling is being completely raised. The new models will be able to autonomously solve complex, open-ended problems over much longer periods of time. • A New Pre-Train Base: This is not an incremental fine-tune. Spud is a completely new foundation built to accelerate the entire economy.




I don’t know exactly what’s going on here, but it does feel AI-related. Unlike PM and eng, which started growing in 2024 (two years post-ChatGPT), design didn’t. If I had to venture a theory, I’d say that because AI is allowing engineers to move so quickly, there’s less opportunity—and less desire—to involve the traditional design process. That said, you’d think design would become a differentiator as more products compete for attention. Something to think about for your company! We’ll keep watching this trend and AI’s impact on org design more generally. One interesting observation we made when we went a level deeper: the ratio of demand for PMs vs. designers has flipped. In mid-2023, we went from more open designer roles to more open PM roles. And ever since, PM demand has been pulling away (currently 1.27x). This will be another trend to monitor, in terms of how AI is reshaping org design.




you’re like 6 prompts away from infinitely customizable personal agi. anthropic gave you a world class agentic harness for free. use it!!!

TSA wait times are absolutely wild right now. So I built a free tracker that shows live waits by checkpoint, including Precheck, Clear, and more (where available). Most tools, including the TSA’s own app, only show airport-wide estimates. Here ya go: tsa.fromthetraytable.com




You can now schedule recurring cloud-based tasks on Claude Code. Set a repo (or repos), a schedule, and a prompt. Claude runs it via cloud infra on your schedule, so you don’t need to keep Claude Code running on your local machine.







Introducing AutoVoiceEvals I've applied the @karpathy autoresearch loop to voice AI agents. It's open source. Your voice agent has a system prompt. That prompt determines how it handles every call - bookings, complaints, edge cases, background noises, long pauses, people trying to trick it. Most teams write it once, test manually, and hope for the best. autovoiceevals makes it a loop. One artifact (system prompt), one metric (adversarial eval score), keep what improves it, revert what doesn't. Run it overnight. Wake up to a better agent. > How it works: You describe your agent in a config file - what it does, its services, policies, and what it should never do. You don't write test cases. You don't define attack vectors. provider: vapi / smallest ai assistant: id: "your-agent-id" description: | Voice receptionist for a hair salon. Maria does coloring only. Jessica does cuts only. $25 cancellation fee under 24 hours notice. Cannot advise on skin conditions. Closed Sundays. From that description alone, Claude generates adversarial caller personas - each with an attack strategy, a voice profile (accents, background noise, mumblers, interrupters), a multi-turn caller script, and pass/fail evaluation criteria. The eval suite is generated once and held fixed for the entire run, like a validation set. > The loop: 1. Read the agent's current prompt from the platform 2. Generate adversarial eval suite from your description 3. Run baseline 4. Claude proposes ONE surgical change to the prompt 5. Push the modified prompt to the agent via API 6. Run all scenarios against the updated agent 7. Score improved? Keep. Same score but shorter prompt? Keep. Otherwise revert. 8. Go to 4. Run until Ctrl+C. The system sees its own experiment history. When a change fails, the next proposal knows what was tried and why it didn't work. We ran 20 experiments on a live Vapi dental scheduling agent. 0 human intervention. > Score: 0.728 → 0.969 (+33%) > CSAT: 45 → 84 > Pass rate: 25% → 100% > 9 kept, 10 discarded > Prompt: 1191 → 1139 chars (better AND shorter) You describe your agent. It figures out how to break it.



