Eren Suner
800 posts

Eren Suner
@geren8te
Agent skills look fine in testing and fail silently in the hands of users. Building the fix at https://t.co/mpHxn9qzpm. @next_canada. Prev. AI research @uoft.
Toronto, Ontario Katılım Ekim 2019
1.3K Takip Edilen204 Takipçiler
Sabitlenmiş Tweet

Introducing: @meetgranola CLI/Claude Code Skill/OpenClaw and Hermes skill from the @ppressdev printed by @damienstevens .
- Cross-meeting SQLite search
- MEMO pipeline runner
- Attendee timelines
- Stop the MCP logged-out pain
Really excited about this one. I can't live without @meetgranola I may have told @damienstevens I loved him when he submitted the PR to the Printing Press.
printingpress.dev
English

@arseniycodes Just use supabase convex or instant directly and ask codex to build you a cli.
What would this stack miss?
English

@theshaneemoret I want to hear more from you on stories like this. How are normies adopting ai?
English

Been helping a handful of business owners with AI implementation. One of them has a side business for fun because she lives on a farm where she sells Dahlia tubers. She had 300 left to sell. To test Codex /goal mode we gave codex a goal to sell 200 Dahlia tubers for Mother's Day. This was around 1pm ET on Saturday, May 9th. By Sunday morning Codex had exceeded the 200 goal and sold 208 tubers. She reset it again after it exceeded the goal and by Sunday at midnight Codex had made her ~$4K and had sold almost 300 tubers.
Context: Codex had access to her email, Shopify, Facebook, local files and I gave it some guidance on the email cadence that I recommended up until Sunday at midnight.
Because my client is a perfectionist, I challenged her to let Codex cook and to not be overbearing when it came to messaging and whatever it posted. After all it was a low-risk experiment before we start to apply this to the B2B side. And she did, she let Codex work.
Codex posted on her Facebook, private Dahlia facebook groups, and other places she didn't even think to post. Codex created all the copy and images.
Codex sent previous customers custom links that were personalized with their names connected to coupon codes that expired at midnight.
Codex even added nice touches that felt personalized when sending the emails like, "Can't wait to see what your first Dahlia's look like in your garden," when it had context that it was this person's first time ever planting Dahlia's (from the email threads).
Codex replied to all customer questions via email correctly and without human intervention.
During this process, Codex even protected my client from a phishing scam email that tried to pose as Shopify not being able to receive payment from customers.
She is amazed and so am I. If you sell a product you would have to be insane to not be leveraging Codex /goal mode, especially for time-limited launches.
Now it's time to test some goals in a higher stakes B2B sales environment.

English

@RaphaelDabadie The comparison between electricity and AI is apt.
I wrote my thoughts on the topic here. I’m interested in your opinion about it since you work closely to this area.
x.com/geren8te/statu…
Eren Suner@geren8te
English

My take on why field work may become the strongest moat in AI.
Raphaël Dabadie (YC P26)@RaphaelDabadie
English

@Akshay_MehtaAM @gokulr Agree with this one, sharing context somehow makes the review quality worse imo
English

@gokulr Running an independent claude evaluator agent (with no context to existing session) also works great - can try that out!
English

your sure? that’ll be a $250k give away for a company like us 🫣
thanks Uncle Sam 🙏
Sam Altman@sama
codex is the best AI coding product and we want to make it easy to try. for the next 30 days, we are giving companies that want to try switching over two months of free codex usage.
English

@geren8te Flappy bird, Tetris, when in doubt Temu Doom, Kirby, some platformer, Wordle....
English

@bubidevs @NotionHQ workers hosted directly in Notion is the real unlock. less glue code, fewer fake integrations. the next missing layer is seeing which skills quietly degrade after launch. that's basically what I am building with skillfully.sh
English

The new @NotionHQ Developer Platform might be one of the most interesting dev releases this year.
Not just APIs: Workers hosted directly by Notion, no infra to manage.
notion.com/product/dev
Three things stood out 👇
1/5
English

@happened_7 275 tables with no schema dump is the right flex. surviving messy enterprise shape matters way more than another toy benchmark.
English

@nrubuilder yep. too many founders try to outsource conviction to investors. get punched by the user first.
English

@jonasgeiping agreed. message passing became the accidental UI for agents. once skills can share richer state than text blobs, the whole loop changes. building skillfully.sh for that layer.
English

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn.
This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information.
In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can …
🔵Be created by instruction-tuning for the stream format
🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in)
🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency
🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security
🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized.
Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.
GIF
English

@eurie_kim totally. hobbyists notice the weird edge cases before the market map people even know the category is real.
English

the best founders i've backed share one trait: they used to be hobbyists first.
not "passionate about the space." actual hobbyists. the person who tracked their own sleep data for years before building a health product. the person who made returns at 15 different retailers before rethinking commerce.
the user obsession predates the company. every time.
when someone pitches me and i can tell they'd be doing this work even if no one was paying them — that's the signal.
English

@derekmeegan this is why skills beat vague 'AI agents'. one sharp capability, obvious output, reusable everywhere. if you're making agent skills and sharing them -> skillfully.sh
English

@ycombinator @mdrnhq @sebwpoole @AlexTomovski help desk + access + offboarding is exactly where agents feel real. clear edges, ugly repetitive work, and obvious ROI.
English

Modern (@mdrnhq) is building the AI-native operating system for IT, with secure agents that automate help desk, access, devices, security, and on/off-boarding end-to-end.
Congrats on the launch, @sebwpoole & @AlexTomovski!
ycombinator.com/launches/QII-m…
English

@lincarson_ exactly. personalization collapses the second the envelope feels fake. sender trust is part of the product, not just deliverability.
English

@yoheinakajima @e2b @RunAnywhereAI @composio @mem0ai @firecrawl @browser_use @agentmail @Covenantlabsai the stack is already there. the bottleneck is composition taste now, not raw model capability.
English

just tried this out and it one-shotted* this video: "before the agent does anything"
*i generated the narrative using chatgpt and used that as a prompt. featuring: @e2b @runanywhereai @composio @mem0ai @firecrawl @browser_use @agentmail @covenantlabsai
some thoughts:
- i clearly tried to stick too much into 30 seconds, they talk very fast and lost some content which breaks logic
- character consistency is strong, i uploaded a single screenshot from my prior video as reference
- voice consistency was not automatic. you notice unicorn switch from female to male voice part way through
- the agent gives you an editor with generated scenes broken up but i don't see a way to regenerate a single section in the UI (which would be nice)
- it is definitely a much better experience to have the agent stitch videos together than doing it yourself (i was using canva). was trying @flymy_ai's media agent api for it this weekend which also works well and with other models
Runway@runwayml
Meet Runway Agent. Your new AI creative partner that helps you ideate and execute fully finished, sound designed and edited videos. All with just a simple conversation. From ads to shorts to content for social, Runway Agent makes it easy to make more of what you need. Get started on web at the link below.
English

@GermainHirwa @knuceles @lincarson_ @bosmeny @harjtaggar this is the kind of founding story that compounds. shared history + shared taste beats cofounder speed dating every time.
English

My cofounders @knuceles and @lincarson_ are moving to SF on Monday.
We’ve known each other for ~15 years (same schools, and hacking together), and now we’re building Hermes full-time together.
We’re a team of young cracked ambitious builders:
• Prev SWE & AI Internships: AWS / Google / Bloomberg / Tesla / BAE Systems experience
• Built production systems at scale (9M+ req/day, BigQuery pipelines, low-latency infra, LLM agents)
• ICPC medalists, Math Olympiad, USACO Platinum
• 20+ hackathon wins (YC AI Agents Hackathon, Hack@Brown, JPM Code for Good, etc.)
We’ve also built and shipped before:
• SaaS products reaching 100K+ users
• AI tools generating $20K+ MRR
• Products later acquired or used in production by institutions.
Now we’re building Hermes — tryhermes.dev
Hermes turns raw behavioral data in your database into personalized life cycle emails for every single user. No segments. No templates. Just per-user context.
What’s been crazy:
• We ship every 2 days (in buplic on X & Linkedin)
• 3x week-over-week growth
• 19 paying customers, zero churn
• 20,000+ emails/week generated
• YC-backed teams already using it in production
• Teams are seeing 30–40%+ lifts in open rates after switching from static tools
The insight is simple:
Companies already have all the signals; who's about to churn, what's the user doing, ... they’re just trapped inside databases no one knows how to use for communication and increase retention.
Hermes turns those hidden signals into action.
We’ve got offers from folks at YC / a16z companies to join them individually as founding engineers or co-founders, but we’re fully committed to building this team together.
We’re betting our next decade on this.
@ycombinator @garrytan Applied to S'26. If this resonates, we’d appreciate a chance to show you what we’re building.



English






