Emre Coklar
1.4K posts

Emre Coklar
@EmreCoklar
Senior Consultants, AI Engineer, DSPy enthusiast, Founder. Building: https://t.co/CrtF7NC69g & https://t.co/iS5Ys2Cwdw

GEPA for skills is here! Introducing gskill, an automated pipeline to learn agent skills with @gepa_ai. With learned skills, we boost Claude Code’s repository task resolution rate to near-perfect levels, while making it 47% faster. Here's how we did it:




@Pranav__Goel @emilymbender fwiw I think the entire question "do LLMs [on their own] have agency" kind of gets it wrong, bc its trivial to put an LLM in a while loop with a REPL which effectively grants the system (REPL + LLM) agency even if the LLM itself on its own is not "agentic" (whatever that means)

Two of our worst VC stories: 1. A Sequoia partner passed on Cloudflare because he didn’t think a woman could lead a security infrastructure company. Seriously. 🙄 2. I got introduced to @pmarca. Meeting got scheduled for a Monday, which should have been a clue. I thought it was just a casual meeting. He thought it was a pitch and brought the whole @a16z partnership team. Hilarity ensued. 🤪 At one point one of them said: “You don’t seem very prepared.” Which was true because I wasn’t. I framed the rejection letter they sent.



I have also open sourced the skills we used in Sentry to prove out this latest iteration. github.com/getsentry/ward… Please use it responsibly. If you find something that others have missed, validate it, and send something up to bounty programs. p.s. Mythos is FUD

god i'm so excited to have noah on the team. been trying to get him here for almost a year. his record of innovation at the frontier of algorithms + infra for self-improving ai is honestly insane, and i think his recent work is my favorite yet. idk how he's so chill about it.




How do we compare model perf in ARC-AGI-3? In most benchmarks you just compare scores, but with ARC-AGI-3 you get reasoning logs across all the games you play To compare Opus 4.8 to Opus 4.7 we used LLM as a judge Using @AmpCode (my daily driver right now) I set up a skill to compare models, then it spawned a sub-agent per game per model Each sub agent did a single-game analysis, then brought its notes back to the main agent Very cool to see all of this come together. It would have taken 2-3 days of analysis by hand before


For the final refine phase, we implemented a cache-optimized Product Quantization (PQ) layout specifically tailored for late interaction. Evaluated on ColBERTv2.0 embeddings, it results in 10 ms single-CPU retrieval on large-scale datasets (MS MARCO-v1, LoTTE Pooled).




I wrote an extended blog post if you want to read more about all this! repoprompt.com/blog/repo-prom…


i made an app that feeds you to the sharks if you don't publicly launch your own product in 30 days. no more of this: "dude i just 100x'ed my workflow with this new AI model"... meanwhile... 0 projects launched 0 revenue 0 users 100 x 0 = still 0. it's time to go from 0 to 1. it's time to: SHlPORDIE.COM 🏴☠️. ship a new product every 30 days until one changes your life or... DIE, in the app, and get kicked from the community forever while being publicly humiliated. no refunds for those who fail to ship. custom trophies to be collected for those who succeed. if you DO ship, you also get to remain in a community of people who actually ship things and get users ++ revenue. sidenote: i'm really excited to see if this can be the push someone needs like how @marclou's shipfast project pushed me and is the entire reason i have a $35K MRR solo operated SaaS now and many other successful mobile apps GLHF, DON'T DIE, and KEEP GOING!! i've never taken a launch this legit so let's see how it goes :)









