Eren Suner

800 posts

Eren Suner

Eren Suner

@geren8te

Agent skills look fine in testing and fail silently in the hands of users. Building the fix at https://t.co/mpHxn9qzpm. @next_canada. Prev. AI research @uoft.

Toronto, Ontario Katılım Ekim 2019
1.3K Takip Edilen204 Takipçiler
Eren Suner
Eren Suner@geren8te·
@arseniycodes Just use supabase convex or instant directly and ask codex to build you a cli. What would this stack miss?
English
1
0
0
31
Arseniy Shishaev (YC P26)
Arseniy Shishaev (YC P26)@arseniycodes·
request for a startup: a flexible database that teams can build their CRMs on. what we'd like: - sync in product events, slacks, emails w/customer - MCP / CLI - a very flexible model so that we can build one-off / custom workflows on top of it
English
3
0
2
147
Eren Suner
Eren Suner@geren8te·
@theshaneemoret I want to hear more from you on stories like this. How are normies adopting ai?
English
1
0
1
139
Shanee Moret
Shanee Moret@theshaneemoret·
Been helping a handful of business owners with AI implementation. One of them has a side business for fun because she lives on a farm where she sells Dahlia tubers. She had 300 left to sell. To test Codex /goal mode we gave codex a goal to sell 200 Dahlia tubers for Mother's Day. This was around 1pm ET on Saturday, May 9th. By Sunday morning Codex had exceeded the 200 goal and sold 208 tubers. She reset it again after it exceeded the goal and by Sunday at midnight Codex had made her ~$4K and had sold almost 300 tubers. Context: Codex had access to her email, Shopify, Facebook, local files and I gave it some guidance on the email cadence that I recommended up until Sunday at midnight. Because my client is a perfectionist, I challenged her to let Codex cook and to not be overbearing when it came to messaging and whatever it posted. After all it was a low-risk experiment before we start to apply this to the B2B side. And she did, she let Codex work. Codex posted on her Facebook, private Dahlia facebook groups, and other places she didn't even think to post. Codex created all the copy and images. Codex sent previous customers custom links that were personalized with their names connected to coupon codes that expired at midnight. Codex even added nice touches that felt personalized when sending the emails like, "Can't wait to see what your first Dahlia's look like in your garden," when it had context that it was this person's first time ever planting Dahlia's (from the email threads). Codex replied to all customer questions via email correctly and without human intervention. During this process, Codex even protected my client from a phishing scam email that tried to pose as Shopify not being able to receive payment from customers. She is amazed and so am I. If you sell a product you would have to be insane to not be leveraging Codex /goal mode, especially for time-limited launches. Now it's time to test some goals in a higher stakes B2B sales environment.
Shanee Moret tweet media
English
3
4
46
12.6K
Akshay Mehta
Akshay Mehta@Akshay_MehtaAM·
@gokulr Running an independent claude evaluator agent (with no context to existing session) also works great - can try that out!
English
2
0
1
153
Gokul Rajaram
Gokul Rajaram@gokulr·
Love using Codex as reviewer for Claude Code (and vice versa :))
Gokul Rajaram tweet media
English
9
0
22
3.1K
Max Schoening
Max Schoening@mschoening·
@geren8te Flappy bird, Tetris, when in doubt Temu Doom, Kirby, some platformer, Wordle....
English
1
0
0
73
derek
derek@derekmeegan·
Turn any website into an API with /browser-to-api. This skill analyzes network activity, CDP logs, and website behavior to generate a custom OpenAPI spec. Watch Codex one-shot a fully documented OpenTable API client from a single prompt 👀
English
18
14
256
23.4K
Eren Suner
Eren Suner@geren8te·
@bubidevs @NotionHQ workers hosted directly in Notion is the real unlock. less glue code, fewer fake integrations. the next missing layer is seeing which skills quietly degrade after launch. that's basically what I am building with skillfully.sh
English
0
0
0
120
Andrea Busi
Andrea Busi@bubidevs·
The new @NotionHQ Developer Platform might be one of the most interesting dev releases this year. Not just APIs: Workers hosted directly by Notion, no infra to manage. notion.com/product/dev Three things stood out 👇 1/5
English
3
6
38
7.5K
Eren Suner
Eren Suner@geren8te·
@happened_7 275 tables with no schema dump is the right flex. surviving messy enterprise shape matters way more than another toy benchmark.
English
0
0
0
63
paari_7
paari_7@happened_7·
Built a self-improving data agent over a 275-table MySQL DB using DSPy RLM + GEPA. No schema dumping, After 752 rollouts it answers complex multi-hop SQL questions cold. Demo dropping soon
English
2
1
20
1.2K
Eren Suner
Eren Suner@geren8te·
@nrubuilder yep. too many founders try to outsource conviction to investors. get punched by the user first.
English
0
0
0
8
Nathan Ruberto
Nathan Ruberto@nrubuilder·
The dumbest thing I see new founders doing in 2026: Talking to VCs before they have anything worth talking about. You're not raising. You're auditioning to be ignored. Talk to your ICP and determine if the problem is real first. Then build the thing. Talk later.
English
0
0
1
21
Eren Suner
Eren Suner@geren8te·
@jonasgeiping agreed. message passing became the accidental UI for agents. once skills can share richer state than text blobs, the whole loop changes. building skillfully.sh for that layer.
English
0
0
0
19
Jonas Geiping
Jonas Geiping@jonasgeiping·
We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.
GIF
English
29
113
887
88.3K
Eren Suner
Eren Suner@geren8te·
@eurie_kim totally. hobbyists notice the weird edge cases before the market map people even know the category is real.
English
0
0
0
9
Eurie Kim
Eurie Kim@eurie_kim·
the best founders i've backed share one trait: they used to be hobbyists first. not "passionate about the space." actual hobbyists. the person who tracked their own sleep data for years before building a health product. the person who made returns at 15 different retailers before rethinking commerce. the user obsession predates the company. every time. when someone pitches me and i can tell they'd be doing this work even if no one was paying them — that's the signal.
English
16
5
87
4.9K
Eren Suner
Eren Suner@geren8te·
@derekmeegan this is why skills beat vague 'AI agents'. one sharp capability, obvious output, reusable everywhere. if you're making agent skills and sharing them -> skillfully.sh
English
0
0
1
124
Eren Suner
Eren Suner@geren8te·
@lincarson_ exactly. personalization collapses the second the envelope feels fake. sender trust is part of the product, not just deliverability.
English
0
0
0
11
Carson Lin
Carson Lin@lincarson_·
Most lifecycle emails already feel automated before you even open them. The sender gives it away. Hermes now supports Microsoft/Outlook inboxes, so teams can send AI-personalized emails from the sender customers actually recognize.
English
0
2
5
79
Yohei
Yohei@yoheinakajima·
just tried this out and it one-shotted* this video: "before the agent does anything" *i generated the narrative using chatgpt and used that as a prompt. featuring: @e2b @runanywhereai @composio @mem0ai @firecrawl @browser_use @agentmail @covenantlabsai some thoughts: - i clearly tried to stick too much into 30 seconds, they talk very fast and lost some content which breaks logic - character consistency is strong, i uploaded a single screenshot from my prior video as reference - voice consistency was not automatic. you notice unicorn switch from female to male voice part way through - the agent gives you an editor with generated scenes broken up but i don't see a way to regenerate a single section in the UI (which would be nice) - it is definitely a much better experience to have the agent stitch videos together than doing it yourself (i was using canva). was trying @flymy_ai's media agent api for it this weekend which also works well and with other models
Runway@runwayml

Meet Runway Agent. Your new AI creative partner that helps you ideate and execute fully finished, sound designed and edited videos. All with just a simple conversation. From ads to shorts to content for social, Runway Agent makes it easy to make more of what you need. Get started on web at the link below.

English
4
2
19
4K
Germain Hirwa
Germain Hirwa@GermainHirwa·
My cofounders @knuceles and @lincarson_ are moving to SF on Monday. We’ve known each other for ~15 years (same schools, and hacking together), and now we’re building Hermes full-time together. We’re a team of young cracked ambitious builders: • Prev SWE & AI Internships: AWS / Google / Bloomberg / Tesla / BAE Systems experience • Built production systems at scale (9M+ req/day, BigQuery pipelines, low-latency infra, LLM agents) • ICPC medalists, Math Olympiad, USACO Platinum • 20+ hackathon wins (YC AI Agents Hackathon, Hack@Brown, JPM Code for Good, etc.) We’ve also built and shipped before: • SaaS products reaching 100K+ users • AI tools generating $20K+ MRR • Products later acquired or used in production by institutions. Now we’re building Hermes — tryhermes.dev Hermes turns raw behavioral data in your database into personalized life cycle emails for every single user. No segments. No templates. Just per-user context. What’s been crazy: • We ship every 2 days (in buplic on X & Linkedin) • 3x week-over-week growth • 19 paying customers, zero churn • 20,000+ emails/week generated • YC-backed teams already using it in production • Teams are seeing 30–40%+ lifts in open rates after switching from static tools The insight is simple: Companies already have all the signals; who's about to churn, what's the user doing, ... they’re just trapped inside databases no one knows how to use for communication and increase retention. Hermes turns those hidden signals into action. We’ve got offers from folks at YC / a16z companies to join them individually as founding engineers or co-founders, but we’re fully committed to building this team together. We’re betting our next decade on this. @ycombinator @garrytan Applied to S'26. If this resonates, we’d appreciate a chance to show you what we’re building.
Germain Hirwa tweet mediaGermain Hirwa tweet mediaGermain Hirwa tweet media
English
4
1
12
824