Eren Suner

800 posts

Eren Suner

@geren8te

Agent skills look fine in testing and fail silently in the hands of users. Building the fix at https://t.co/mpHxn9qzpm. @next_canada. Prev. AI research @uoft.

Toronto, Ontario Katılım Ekim 2019

1.3K Takip Edilen204 Takipçiler

Sabitlenmiş Tweet

Eren Suner@geren8te·9h

x.com/i/article/2053…

ZXX

126

Eren Suner@geren8te·1h

@mvanhorn @meetgranola @ppressdev @damienstevens Thank you :))) want to try it?

English

Matt Van Horn@mvanhorn·1h

@geren8te @meetgranola @ppressdev @damienstevens love the design!

English

Matt Van Horn@mvanhorn·1h

Introducing: @meetgranola CLI/Claude Code Skill/OpenClaw and Hermes skill from the @ppressdev printed by @damienstevens . - Cross-meeting SQLite search - MEMO pipeline runner - Attendee timelines - Stop the MCP logged-out pain Really excited about this one. I can't live without @meetgranola I may have told @damienstevens I loved him when he submitted the PR to the Printing Press. printingpress.dev

English

5.2K

Eren Suner@geren8te·1h

@arseniycodes Just use supabase convex or instant directly and ask codex to build you a cli. What would this stack miss?

English

Arseniy Shishaev (YC P26)@arseniycodes·2h

request for a startup: a flexible database that teams can build their CRMs on. what we'd like: - sync in product events, slacks, emails w/customer - MCP / CLI - a very flexible model so that we can build one-off / custom workflows on top of it

English

147

Eren Suner@geren8te·2h

@theshaneemoret I want to hear more from you on stories like this. How are normies adopting ai?

English

139

Shanee Moret@theshaneemoret·2d

Been helping a handful of business owners with AI implementation. One of them has a side business for fun because she lives on a farm where she sells Dahlia tubers. She had 300 left to sell. To test Codex /goal mode we gave codex a goal to sell 200 Dahlia tubers for Mother's Day. This was around 1pm ET on Saturday, May 9th. By Sunday morning Codex had exceeded the 200 goal and sold 208 tubers. She reset it again after it exceeded the goal and by Sunday at midnight Codex had made her ~$4K and had sold almost 300 tubers. Context: Codex had access to her email, Shopify, Facebook, local files and I gave it some guidance on the email cadence that I recommended up until Sunday at midnight. Because my client is a perfectionist, I challenged her to let Codex cook and to not be overbearing when it came to messaging and whatever it posted. After all it was a low-risk experiment before we start to apply this to the B2B side. And she did, she let Codex work. Codex posted on her Facebook, private Dahlia facebook groups, and other places she didn't even think to post. Codex created all the copy and images. Codex sent previous customers custom links that were personalized with their names connected to coupon codes that expired at midnight. Codex even added nice touches that felt personalized when sending the emails like, "Can't wait to see what your first Dahlia's look like in your garden," when it had context that it was this person's first time ever planting Dahlia's (from the email threads). Codex replied to all customer questions via email correctly and without human intervention. During this process, Codex even protected my client from a phishing scam email that tried to pose as Shopify not being able to receive payment from customers. She is amazed and so am I. If you sell a product you would have to be insane to not be leveraging Codex /goal mode, especially for time-limited launches. Now it's time to test some goals in a higher stakes B2B sales environment.

English

12.6K

Eren Suner@geren8te·2h

@RaphaelDabadie The comparison between electricity and AI is apt. I wrote my thoughts on the topic here. I’m interested in your opinion about it since you work closely to this area. x.com/geren8te/statu…

Eren Suner@geren8te

x.com/i/article/2053…

English

Raphaël Dabadie (YC P26)@RaphaelDabadie·5h

My take on why field work may become the strongest moat in AI.

Raphaël Dabadie (YC P26)@RaphaelDabadie

x.com/i/article/2054…

English

286

Eren Suner@geren8te·2h

@Akshay_MehtaAM @gokulr Agree with this one, sharing context somehow makes the review quality worse imo

English

Akshay Mehta@Akshay_MehtaAM·3h

@gokulr Running an independent claude evaluator agent (with no context to existing session) also works great - can try that out!

English

153

Gokul Rajaram@gokulr·3h

Love using Codex as reviewer for Claude Code (and vice versa :))

English

3.1K

Eren Suner@geren8te·3h

@liu8in Uncle Sam 😂😂😂 Good one

English

Bin Liu@liu8in·10h

your sure? that’ll be a $250k give away for a company like us 🫣 thanks Uncle Sam 🙏

Sam Altman@sama

codex is the best AI coding product and we want to make it easy to try. for the next 30 days, we are giving companies that want to try switching over two months of free codex usage.

English

1.8K

Eren Suner@geren8te·3h

@mschoening Temu doom 😂😂😂

Indonesia

Max Schoening@mschoening·3h

@geren8te Flappy bird, Tetris, when in doubt Temu Doom, Kirby, some platformer, Wordle....

English

Max Schoening@mschoening·5h

No idea. Claude did it. We can ask if you'd like?

Ashirwad Singh@ashirwadsingh_

@mschoening @NotionHQ Would genuinely love to know how someone builds a game inside a website footer Makes me want to start adding mini games to websites too.

English

2.4K

Eren Suner@geren8te·5h

@derekmeegan @kylejeong Was literally telling @pk_iv about this couple of weeks ago.

English

105

derek@derekmeegan·6h

Turn any website into an API with /browser-to-api. This skill analyzes network activity, CDP logs, and website behavior to generate a custom OpenAPI spec. Watch Codex one-shot a fully documented OpenTable API client from a single prompt 👀

English

256

23.4K

Eren Suner@geren8te·5h

@bubidevs @NotionHQ workers hosted directly in Notion is the real unlock. less glue code, fewer fake integrations. the next missing layer is seeing which skills quietly degrade after launch. that's basically what I am building with skillfully.sh

English

120

Andrea Busi@bubidevs·8h

The new @NotionHQ Developer Platform might be one of the most interesting dev releases this year. Not just APIs: Workers hosted directly by Notion, no infra to manage. notion.com/product/dev Three things stood out 👇 1/5

English

7.5K

Eren Suner@geren8te·5h

@happened_7 275 tables with no schema dump is the right flex. surviving messy enterprise shape matters way more than another toy benchmark.

English

paari_7@happened_7·10h

Built a self-improving data agent over a 275-table MySQL DB using DSPy RLM + GEPA. No schema dumping, After 752 rollouts it answers complex multi-hop SQL questions cold. Demo dropping soon

English

1.2K

Eren Suner@geren8te·5h

@nrubuilder yep. too many founders try to outsource conviction to investors. get punched by the user first.

English

Nathan Ruberto@nrubuilder·10h

The dumbest thing I see new founders doing in 2026: Talking to VCs before they have anything worth talking about. You're not raising. You're auditioning to be ignored. Talk to your ICP and determine if the problem is real first. Then build the thing. Talk later.

English

Eren Suner@geren8te·5h

@jonasgeiping agreed. message passing became the accidental UI for agents. once skills can share richer state than text blobs, the whole loop changes. building skillfully.sh for that layer.

English

Jonas Geiping@jonasgeiping·12h

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.

GIF

English

113

887

88.3K

Eren Suner@geren8te·5h

@eurie_kim totally. hobbyists notice the weird edge cases before the market map people even know the category is real.

English

Eurie Kim@eurie_kim·13h

the best founders i've backed share one trait: they used to be hobbyists first. not "passionate about the space." actual hobbyists. the person who tracked their own sleep data for years before building a health product. the person who made returns at 15 different retailers before rethinking commerce. the user obsession predates the company. every time. when someone pitches me and i can tell they'd be doing this work even if no one was paying them — that's the signal.

English

4.9K

Eren Suner@geren8te·5h

@derekmeegan this is why skills beat vague 'AI agents'. one sharp capability, obvious output, reusable everywhere. if you're making agent skills and sharing them -> skillfully.sh

English

124

Eren Suner@geren8te·5h

@ycombinator @mdrnhq @sebwpoole @AlexTomovski help desk + access + offboarding is exactly where agents feel real. clear edges, ugly repetitive work, and obvious ROI.

English

Y Combinator@ycombinator·6h

Modern (@mdrnhq) is building the AI-native operating system for IT, with secure agents that automate help desk, access, devices, security, and on/off-boarding end-to-end. Congrats on the launch, @sebwpoole & @AlexTomovski! ycombinator.com/launches/QII-m…

English

13.6K

Eren Suner@geren8te·5h

@lincarson_ exactly. personalization collapses the second the envelope feels fake. sender trust is part of the product, not just deliverability.

English

Carson Lin@lincarson_·10h

Most lifecycle emails already feel automated before you even open them. The sender gives it away. Hermes now supports Microsoft/Outlook inboxes, so teams can send AI-personalized emails from the sender customers actually recognize.

English

Eren Suner@geren8te·5h

@yoheinakajima @e2b @RunAnywhereAI @composio @mem0ai @firecrawl @browser_use @agentmail @Covenantlabsai the stack is already there. the bottleneck is composition taste now, not raw model capability.

English

Yohei@yoheinakajima·6h

just tried this out and it one-shotted* this video: "before the agent does anything" *i generated the narrative using chatgpt and used that as a prompt. featuring: @e2b @runanywhereai @composio @mem0ai @firecrawl @browser_use @agentmail @covenantlabsai some thoughts: - i clearly tried to stick too much into 30 seconds, they talk very fast and lost some content which breaks logic - character consistency is strong, i uploaded a single screenshot from my prior video as reference - voice consistency was not automatic. you notice unicorn switch from female to male voice part way through - the agent gives you an editor with generated scenes broken up but i don't see a way to regenerate a single section in the UI (which would be nice) - it is definitely a much better experience to have the agent stitch videos together than doing it yourself (i was using canva). was trying @flymy_ai's media agent api for it this weekend which also works well and with other models

Runway@runwayml

Meet Runway Agent. Your new AI creative partner that helps you ideate and execute fully finished, sound designed and edited videos. All with just a simple conversation. From ads to shorts to content for social, Runway Agent makes it easy to make more of what you need. Get started on web at the link below.

English

Eren Suner@geren8te·5h

@GermainHirwa @knuceles @lincarson_ @bosmeny @harjtaggar this is the kind of founding story that compounds. shared history + shared taste beats cofounder speed dating every time.

English

Germain Hirwa@GermainHirwa·7h

My cofounders @knuceles and @lincarson_ are moving to SF on Monday. We’ve known each other for ~15 years (same schools, and hacking together), and now we’re building Hermes full-time together. We’re a team of young cracked ambitious builders: • Prev SWE & AI Internships: AWS / Google / Bloomberg / Tesla / BAE Systems experience • Built production systems at scale (9M+ req/day, BigQuery pipelines, low-latency infra, LLM agents) • ICPC medalists, Math Olympiad, USACO Platinum • 20+ hackathon wins (YC AI Agents Hackathon, Hack@Brown, JPM Code for Good, etc.) We’ve also built and shipped before: • SaaS products reaching 100K+ users • AI tools generating $20K+ MRR • Products later acquired or used in production by institutions. Now we’re building Hermes — tryhermes.dev Hermes turns raw behavioral data in your database into personalized life cycle emails for every single user. No segments. No templates. Just per-user context. What’s been crazy: • We ship every 2 days (in buplic on X & Linkedin) • 3x week-over-week growth • 19 paying customers, zero churn • 20,000+ emails/week generated • YC-backed teams already using it in production • Teams are seeing 30–40%+ lifts in open rates after switching from static tools The insight is simple: Companies already have all the signals; who's about to churn, what's the user doing, ... they’re just trapped inside databases no one knows how to use for communication and increase retention. Hermes turns those hidden signals into action. We’ve got offers from folks at YC / a16z companies to join them individually as founding engineers or co-founders, but we’re fully committed to building this team together. We’re betting our next decade on this. @ycombinator @garrytan Applied to S'26. If this resonates, we’d appreciate a chance to show you what we’re building.

English

824

Keşfet

@mvanhorn @meetgranola @ppressdev @damienstevens @arseniycodes @theshaneemoret @RaphaelDabadie @Akshay_MehtaAM