Philemon Kiprono 🇰🇪

3.2K posts

Philemon Kiprono 🇰🇪

@ronoh4

AI Engineer | DSPy | Azure | ...Arsenal F.C

Nairobi, Kenya انضم Mart 2011

1.5K يتبع435 المتابعون

Philemon Kiprono 🇰🇪@ronoh4·1d

🚀 DSPy 3.2.0 is out! 🔗 BetterTogether chains optimizers: GEPA → BootstrapFinetune → GEPA via strategy strings 🔌 LiteLLM decoupling begins — custom backends, no litellm dep 🛡️ Hardened RLM & PythonInterpreter — structured errors, resilient parsing Exciting times for #DSPy

English

Philemon Kiprono 🇰🇪@ronoh4·4d

@itsgabbyt @LagatJustin Excited to see you in Nairobi Kip Keino

English

252

Gabby Thomas@itsgabbyt·4d

Windy day but ran some great early season races in Ethiopia! Thank you Ethiopia for a great meet and experience 🫶🏽 Up next 👉🏽 Kenya

English

105

29.8K

Philemon Kiprono 🇰🇪 أُعيد تغريده

Charly Wargnier@DataChaz·4d

ANTHROPIC LITERALLY JUST HANDING US THE BLUEPRINT🤯 Their new 33-page guide on Claude Skills is the cheat code. Make sure to bookmark this before it gets lost in your feed. Link in 🧵↓

m0h@exploraX_

x.com/i/article/2039…

English

298

2.6K

661K

Philemon Kiprono 🇰🇪 أُعيد تغريده

Victor M@victormustar·13 Nis

GLM-5.1 > Claude Code (Opus 4.6)? I'm tripping or CC has become very bad but built a Three.js racing game to eval and it's extremely impressive. Thoughts: - One-shot car physics with real drift mechanics (this is hard) - My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen - 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters - All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio! - Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed You are going to hear about this model a lot in the next months - open source let's go 🚀🚀

English

104

1.4K

188K

Philemon Kiprono 🇰🇪 أُعيد تغريده

elvis@omarsar0·9 Mar

Pay attention to this one if you are building terminal-based coding agents. OpenDev is an 81-page paper covering scaffolding, harness design, context engineering, and hard-won lessons from building CLI coding agents. It introduces a compound AI system architecture with workload-specialized model routing, a dual-agent architecture separating planning from execution, lazy tool discovery, and adaptive context compaction. The industry is shifting from IDE plugins to terminal-native agents. Claude Code, Codex CLI, and others have proven the model works. This paper formalizes the design patterns that make these systems reliable, covering topics like event-driven system reminders to counteract instruction fade-out, automated memory across sessions, and strict safety controls for autonomous operation. Paper: arxiv.org/abs/2603.05344 Learn to build effective AI agents in our academy: academy.dair.ai

English

288

1.8K

144K

Philemon Kiprono 🇰🇪 أُعيد تغريده

Omar Khattab@lateinteraction·7 Mar

.@LakshyAAAgrawal has done an unbelievably nice job collecting successful applications of GEPA to everything from state-of-the-art research to new products and established enterprises at this page. Staying on top of these is harder than one would think! gepa-ai.github.io/gepa/guides/us…

English

196

10.9K

Philemon Kiprono 🇰🇪@ronoh4·25 Oca

@Gatsz01 @Osama_otero Amazing

English

gatsz@Gatsz01·23 Oca

Just finished building a Swahili based programming language Jenga Programming Language (Version 1.0 :The Pilot Version) Check it out github.com/gatsz1/Jenga For now it's simple Some features will be added soon Happy to receive feedback. Happy Jenga-ring

English

461

1.7K

49.6K

Philemon Kiprono 🇰🇪@ronoh4·20 Oca

@tech_optimist Testin it out already

English

Prashanth Rao@tech_optimist·20 Oca

dspy.RLM has dropped!!! I clearly know what my next deep rabbit hole (and suite of experiments) will involve, haha. Thanks so much to the DSPy team for pushing this through, this is a super exciting development for the entire ecosystem 🚀

isaac 🧩@isaacbmiller1

The dspy.RLM module is now released 👀 Install DSPy 3.1.2 to try it. Usage is plug-and-play with your existing Signatures. A little example of it helping @lateinteraction and I figure out some scattered backlogs:

English

6.8K

Philemon Kiprono 🇰🇪@ronoh4·14 Oca

@lateinteraction Quite a feat 👏🏾👏🏾

English

101

Omar Khattab@lateinteraction·14 Oca

I've debated quite a bit whether it's meaningful to make a whole post about this but maybe why not. For some reason, last week was the first in 3 years of DSPy to exceed 1,000,000 package downloads in a single week. Good to see. We're working on a bunch of new pieces for it.

English

257

27.7K

Philemon Kiprono 🇰🇪@ronoh4·9 Oca

@lateinteraction Sounds like a good weekend read

English

Omar Khattab@lateinteraction·8 Oca

exciting new O'Reilly book by a deeply seasoned author

Mike Taylor@hammer_mt

Ok ok that was a joke, it's actually on context engineering.

English

8.1K

Philemon Kiprono 🇰🇪@ronoh4·5 Oca

@lateinteraction I am testing it already and it has hidden features that are currently understated like how good it is in handling constraints enforcement for outputs

English

Omar Khattab@lateinteraction·4 Oca

Official repo for RLMs just dropped, in a very early stage. Feedback and contributions welcome!

alex zhang@a1zhang

I was considering waiting a while to polish this first, but decided it'd be better to just release an initial version to get better community feedback and squash bugs! This is the official RLM repo, with native support for cloud-based and local REPLs. github.com/alexzhang13/rlm

English

151

17.3K

Philemon Kiprono 🇰🇪@ronoh4·5 Oca

#RLM lets LMs run code to enforce constraints instead of “hoping” a prompt is followed. External #Python checks, logged & inspectable. More details here 👉bit.ly/rlminf

English

Philemon Kiprono 🇰🇪@ronoh4·5 Oca

@a1zhang Amazing work @a1zhang I will be keenly following and contributing when time allows

English

alex zhang@a1zhang·5 Oca

For those interested in making OSS contributions to the RLM repo, I've added a bunch of random thoughts and TODOs of what to add in a *messy* Markdown file on the GH repo. Feel free to tackle any of them, or any other things you think are meaningful. I'll be pretty active here or on the repo. Once I finish some other related work, I might open up a Discord channel or something for people who want to make longer standing contributions to the repo / discuss the direction of where to take it. Cheers! github.com/alexzhang13/rl…

English

292

17.3K

Philemon Kiprono 🇰🇪@ronoh4·4 Oca

An interesting take here. Is it possible we've been measuring intelligence invalidly?

Robert Youssef@rryssf_

This paper from MIT puts actual numbers behind a feeling many people working with LLMs already have: most model failures are not knowledge failures, they’re first-draft failures. The paper studies Recursive Language Models (RLMs) and asks a very specific question: What happens if you let the same model revise its own output multiple times instead of scaling parameters? The answer is surprisingly concrete. Across reasoning-heavy benchmarks, the authors show that recursion consistently improves accuracy with no change in model size. On multi-step reasoning tasks, adding just 2–4 recursive passes improves correctness by 10–25%, depending on task complexity. On longer planning problems, error rates drop even more sharply, because early logical mistakes get corrected in later passes instead of propagating forward. One figure in the paper makes this especially clear. They plot task accuracy vs. recursion depth. The curve is steep at first: • Pass 1 → baseline performance • Pass 2 → large jump in correctness • Pass 3–4 → diminishing but still meaningful gains After ~4 iterations, returns taper off, which suggests something important: most reasoning failures happen early, and a small amount of structured revision fixes a large fraction of them. There’s also a cost comparison that’s hard to ignore. The authors compare: • A larger non-recursive model • A smaller recursive model using multiple passes The recursive model reaches comparable or better accuracy while using fewer parameters and fewer total tokens in the final answer. Even though recursion adds compute internally, the output length shrinks because later passes compress and clean up earlier drafts. In plain terms: the model thinks more, but talks less. Another quantitative result I found fascinating is hallucination reduction. The paper measures factual consistency across iterations and finds that later recursive passes explicitly remove unsupported claims introduced in earlier drafts. The probability of a hallucinated statement surviving to the final output drops significantly after the second pass, because the model is now evaluating its own content instead of blindly extending it. This directly challenges the “long chain-of-thought = better reasoning” assumption. The data suggests the opposite. Better reasoning comes from iterative self-correction, not from dumping more intermediate tokens. The recursion acts like an internal verifier that gradually aligns the output with constraints imposed by the task. There’s also a subtle systems insight hidden in the math. If accuracy improves roughly logarithmically with recursion depth but model size improves accuracy sublinearly with parameter count, then recursion is simply a more efficient lever. You get more reasoning per unit of compute by looping than by scaling. That’s a big deal for deployment. Instead of asking: “How big can we make the model?” This paper asks: “How many chances does the model get to be wrong before we trust it?” The broader implication is hard to miss. We’ve been benchmarking models on their first answer. But intelligence doesn’t live in the first answer. It lives in revision curves, error decay rates, and how quickly a system converges toward correctness when allowed to reflect. This paper doesn’t just propose a technique. It quietly suggests we’ve been measuring the wrong thing all along.

English

Philemon Kiprono 🇰🇪@ronoh4·4 Oca

An interesting 'output' aspect of RLMs.

Omar Khattab@lateinteraction

Another understated aspect of RLMs: the *output* length is essentially unbounded too, not only input. A simple test of the difference in expressive power between a Transformer and an RLM: Give your favorite model a 30k-token prompt and ask it to repeat it verbatim. They will all fail, but an RLM trivially succeeds.

English

Philemon Kiprono 🇰🇪@ronoh4·4 Oca

@a1zhang @chaiyihein This is super cool

English

184

alex zhang@a1zhang·4 Oca

Nope, it can directly use a variable in the REPL environment as the output. For example, let's say you have 1B tokens of Excel data want it to transform into 1B tokens of some transformation over it. The model can divide it up into 100K chunks, call LM over each chunk, store the output in a variable, then join them together in Python. Then it can just call FINAL_VAR(final_output) and it will return this huge output. In @omouamoua's implementation in Prime Intellect's verifiers, he actually explicitly only lets the model output from a variable in a REPL. We discuss this more in the paper if you're interested!

English

23.9K

alex zhang@a1zhang·4 Oca

English

133

1.2K

120.2K

Philemon Kiprono 🇰🇪 أُعيد تغريده

Min Choi@minchoi·30 Kas

This is literally my new workflow now: Real-time search → Grok 4.1 Fast Planning → Grok 4.1 Thinking Frontend Coding → Gemini 3 Pro Backend Coding → Claude Code (Opus/Sonnet 4.5) Write Tests → Gemini 3 Pro Run Tests → GPT-5.1 Codex Debug → Claude Opus 4.5 Bookmark this.

English

177

252

2.9K

207.8K

Philemon Kiprono 🇰🇪 أُعيد تغريده

Prajwal Tomar@PrajwalTomar_·29 Kas

Claude Opus 4.5 is crazy. This is the biggest jump in AI coding I’ve ever seen. You don’t need to “know how to code” to build now. Anyone with an idea can ship a full SaaS if they learn how to work with AI. When the world changes this fast, the people who adapt early win the most. Learn vibe coding now.

Prajwal Tomar@PrajwalTomar_

I’ve been testing Opus 4.5 and honestly… it surprised me. Gave it a rough summary of a bug, pointed it to the files, didn’t even explain the issue properly. It replied so fast I thought it failed. Tested the app and boom the bug was gone. One shot. Opus feels way faster than Sonnet 4.5, handles multiple tasks without getting confused, and the pricing is basically the same. Might be my new go to. Would love to know what your experience has been. Anyone getting similar results?

English

640

90.8K

Philemon Kiprono 🇰🇪 أُعيد تغريده

Sundar Pichai@sundarpichai·18 Kas

Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting. Find Gemini 3 Pro rolling out today in the @Geminiapp and AI Mode in Search. For developers, build with it now in @GoogleAIStudio and Vertex AI. Excited for you to try it!

English

1.1K

2.6K

21.4K

2.9M

Philemon Kiprono 🇰🇪 أُعيد تغريده

DSPy@DSPyOSS·17 Kas

DSPy is now available to Clojure programmers, after Python, Typescript, Ruby, Elixir, and Go.

Kapil@KapilReddy

Announcing DSCloj! github.com/unravel-team/D… A declarative way to do prompt engineering in Clojure. It is inspired by DSPy library in Python. In it’s current shape API looks very similar to instructor-clj right now. But next up DSCloj will have optimisers too. A few things coming up next, - Observability integration with Otel - Prompt optimisers with REPL-first API - EDN compatible serialisation for modules. It will be handy to save optimised modules PS - It is such a joy building things with Clojure. I have been writing Python with DSPy and building similar use-case in Clojure is just simple.

English

7.5K

اكتشف

@itsgabbyt @LagatJustin @LakshyAAAgrawal @Gatsz01 @Osama_otero @tech_optimist @lateinteraction @a1zhang