Philemon Kiprono 🇰🇪

3.2K posts

Philemon Kiprono 🇰🇪 banner
Philemon Kiprono 🇰🇪

Philemon Kiprono 🇰🇪

@ronoh4

AI Engineer | DSPy | Azure | ...Arsenal F.C

Nairobi, Kenya انضم Mart 2011
1.5K يتبع435 المتابعون
Philemon Kiprono 🇰🇪
🚀 DSPy 3.2.0 is out! 🔗 BetterTogether chains optimizers: GEPA → BootstrapFinetune → GEPA via strategy strings 🔌 LiteLLM decoupling begins — custom backends, no litellm dep 🛡️ Hardened RLM & PythonInterpreter — structured errors, resilient parsing Exciting times for #DSPy
English
1
2
9
1K
Gabby Thomas
Gabby Thomas@itsgabbyt·
Windy day but ran some great early season races in Ethiopia! Thank you Ethiopia for a great meet and experience 🫶🏽 Up next 👉🏽 Kenya
English
42
105
1K
29.8K
Philemon Kiprono 🇰🇪 أُعيد تغريده
Charly Wargnier
Charly Wargnier@DataChaz·
ANTHROPIC LITERALLY JUST HANDING US THE BLUEPRINT🤯 Their new 33-page guide on Claude Skills is the cheat code. Make sure to bookmark this before it gets lost in your feed. Link in 🧵↓
m0h@exploraX_

x.com/i/article/2039…

English
45
298
2.6K
661K
Philemon Kiprono 🇰🇪 أُعيد تغريده
Victor M
Victor M@victormustar·
GLM-5.1 > Claude Code (Opus 4.6)? I'm tripping or CC has become very bad but built a Three.js racing game to eval and it's extremely impressive. Thoughts: - One-shot car physics with real drift mechanics (this is hard) - My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen - 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters - All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio! - Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed You are going to hear about this model a lot in the next months - open source let's go 🚀🚀
English
70
104
1.4K
188K
Philemon Kiprono 🇰🇪 أُعيد تغريده
elvis
elvis@omarsar0·
Pay attention to this one if you are building terminal-based coding agents. OpenDev is an 81-page paper covering scaffolding, harness design, context engineering, and hard-won lessons from building CLI coding agents. It introduces a compound AI system architecture with workload-specialized model routing, a dual-agent architecture separating planning from execution, lazy tool discovery, and adaptive context compaction. The industry is shifting from IDE plugins to terminal-native agents. Claude Code, Codex CLI, and others have proven the model works. This paper formalizes the design patterns that make these systems reliable, covering topics like event-driven system reminders to counteract instruction fade-out, automated memory across sessions, and strict safety controls for autonomous operation. Paper: arxiv.org/abs/2603.05344 Learn to build effective AI agents in our academy: academy.dair.ai
elvis tweet media
English
60
288
1.8K
144K
Philemon Kiprono 🇰🇪 أُعيد تغريده
Omar Khattab
Omar Khattab@lateinteraction·
.@LakshyAAAgrawal has done an unbelievably nice job collecting successful applications of GEPA to everything from state-of-the-art research to new products and established enterprises at this page. Staying on top of these is harder than one would think! gepa-ai.github.io/gepa/guides/us…
English
1
24
196
10.9K
gatsz
gatsz@Gatsz01·
Just finished building a Swahili based programming language Jenga Programming Language (Version 1.0 :The Pilot Version) Check it out github.com/gatsz1/Jenga For now it's simple Some features will be added soon Happy to receive feedback. Happy Jenga-ring
gatsz tweet media
English
73
461
1.7K
49.6K
Prashanth Rao
Prashanth Rao@tech_optimist·
dspy.RLM has dropped!!! I clearly know what my next deep rabbit hole (and suite of experiments) will involve, haha. Thanks so much to the DSPy team for pushing this through, this is a super exciting development for the entire ecosystem 🚀
isaac 🧩@isaacbmiller1

The dspy.RLM module is now released 👀 Install DSPy 3.1.2 to try it. Usage is plug-and-play with your existing Signatures. A little example of it helping @lateinteraction and I figure out some scattered backlogs:

English
6
5
88
6.8K
Omar Khattab
Omar Khattab@lateinteraction·
I've debated quite a bit whether it's meaningful to make a whole post about this but maybe why not. For some reason, last week was the first in 3 years of DSPy to exceed 1,000,000 package downloads in a single week. Good to see. We're working on a bunch of new pieces for it.
English
22
16
257
27.7K
Philemon Kiprono 🇰🇪
@lateinteraction I am testing it already and it has hidden features that are currently understated like how good it is in handling constraints enforcement for outputs
English
0
0
0
10
alex zhang
alex zhang@a1zhang·
For those interested in making OSS contributions to the RLM repo, I've added a bunch of random thoughts and TODOs of what to add in a *messy* Markdown file on the GH repo. Feel free to tackle any of them, or any other things you think are meaningful. I'll be pretty active here or on the repo. Once I finish some other related work, I might open up a Discord channel or something for people who want to make longer standing contributions to the repo / discuss the direction of where to take it. Cheers! github.com/alexzhang13/rl…
English
19
30
292
17.3K
Philemon Kiprono 🇰🇪
An interesting take here. Is it possible we've been measuring intelligence invalidly?
Robert Youssef@rryssf_

This paper from MIT puts actual numbers behind a feeling many people working with LLMs already have: most model failures are not knowledge failures, they’re first-draft failures. The paper studies Recursive Language Models (RLMs) and asks a very specific question: What happens if you let the same model revise its own output multiple times instead of scaling parameters? The answer is surprisingly concrete. Across reasoning-heavy benchmarks, the authors show that recursion consistently improves accuracy with no change in model size. On multi-step reasoning tasks, adding just 2–4 recursive passes improves correctness by 10–25%, depending on task complexity. On longer planning problems, error rates drop even more sharply, because early logical mistakes get corrected in later passes instead of propagating forward. One figure in the paper makes this especially clear. They plot task accuracy vs. recursion depth. The curve is steep at first: • Pass 1 → baseline performance • Pass 2 → large jump in correctness • Pass 3–4 → diminishing but still meaningful gains After ~4 iterations, returns taper off, which suggests something important: most reasoning failures happen early, and a small amount of structured revision fixes a large fraction of them. There’s also a cost comparison that’s hard to ignore. The authors compare: • A larger non-recursive model • A smaller recursive model using multiple passes The recursive model reaches comparable or better accuracy while using fewer parameters and fewer total tokens in the final answer. Even though recursion adds compute internally, the output length shrinks because later passes compress and clean up earlier drafts. In plain terms: the model thinks more, but talks less. Another quantitative result I found fascinating is hallucination reduction. The paper measures factual consistency across iterations and finds that later recursive passes explicitly remove unsupported claims introduced in earlier drafts. The probability of a hallucinated statement surviving to the final output drops significantly after the second pass, because the model is now evaluating its own content instead of blindly extending it. This directly challenges the “long chain-of-thought = better reasoning” assumption. The data suggests the opposite. Better reasoning comes from iterative self-correction, not from dumping more intermediate tokens. The recursion acts like an internal verifier that gradually aligns the output with constraints imposed by the task. There’s also a subtle systems insight hidden in the math. If accuracy improves roughly logarithmically with recursion depth but model size improves accuracy sublinearly with parameter count, then recursion is simply a more efficient lever. You get more reasoning per unit of compute by looping than by scaling. That’s a big deal for deployment. Instead of asking: “How big can we make the model?” This paper asks: “How many chances does the model get to be wrong before we trust it?” The broader implication is hard to miss. We’ve been benchmarking models on their first answer. But intelligence doesn’t live in the first answer. It lives in revision curves, error decay rates, and how quickly a system converges toward correctness when allowed to reflect. This paper doesn’t just propose a technique. It quietly suggests we’ve been measuring the wrong thing all along.

English
0
0
0
7
alex zhang
alex zhang@a1zhang·
Nope, it can directly use a variable in the REPL environment as the output. For example, let's say you have 1B tokens of Excel data want it to transform into 1B tokens of some transformation over it. The model can divide it up into 100K chunks, call LM over each chunk, store the output in a variable, then join them together in Python. Then it can just call FINAL_VAR(final_output) and it will return this huge output. In @omouamoua's implementation in Prime Intellect's verifiers, he actually explicitly only lets the model output from a variable in a REPL. We discuss this more in the paper if you're interested!
English
1
0
34
23.9K
alex zhang
alex zhang@a1zhang·
I was considering waiting a while to polish this first, but decided it'd be better to just release an initial version to get better community feedback and squash bugs! This is the official RLM repo, with native support for cloud-based and local REPLs. github.com/alexzhang13/rlm
English
46
133
1.2K
120.2K
Philemon Kiprono 🇰🇪 أُعيد تغريده
Min Choi
Min Choi@minchoi·
This is literally my new workflow now: Real-time search → Grok 4.1 Fast Planning → Grok 4.1 Thinking Frontend Coding → Gemini 3 Pro Backend Coding → Claude Code (Opus/Sonnet 4.5) Write Tests → Gemini 3 Pro Run Tests → GPT-5.1 Codex Debug → Claude Opus 4.5 Bookmark this.
English
177
252
2.9K
207.8K
Philemon Kiprono 🇰🇪 أُعيد تغريده
Prajwal Tomar
Prajwal Tomar@PrajwalTomar_·
Claude Opus 4.5 is crazy. This is the biggest jump in AI coding I’ve ever seen. You don’t need to “know how to code” to build now. Anyone with an idea can ship a full SaaS if they learn how to work with AI. When the world changes this fast, the people who adapt early win the most. Learn vibe coding now.
Prajwal Tomar@PrajwalTomar_

I’ve been testing Opus 4.5 and honestly… it surprised me. Gave it a rough summary of a bug, pointed it to the files, didn’t even explain the issue properly. It replied so fast I thought it failed. Tested the app and boom the bug was gone. One shot. Opus feels way faster than Sonnet 4.5, handles multiple tasks without getting confused, and the pricing is basically the same. Might be my new go to. Would love to know what your experience has been. Anyone getting similar results?

English
48
30
640
90.8K
Philemon Kiprono 🇰🇪 أُعيد تغريده
Sundar Pichai
Sundar Pichai@sundarpichai·
Introducing Gemini 3 ✨ It’s the best model in the world for multimodal understanding, and our most powerful agentic + vibe coding model yet. Gemini 3 can bring any idea to life, quickly grasping context and intent so you can get what you need with less prompting.  Find Gemini 3 Pro rolling out today in the @Geminiapp and AI Mode in Search. For developers, build with it now in @GoogleAIStudio and Vertex AI.  Excited for you to try it!
English
1.1K
2.6K
21.4K
2.9M
Philemon Kiprono 🇰🇪 أُعيد تغريده
DSPy
DSPy@DSPyOSS·
DSPy is now available to Clojure programmers, after Python, Typescript, Ruby, Elixir, and Go.
Kapil@KapilReddy

Announcing DSCloj! github.com/unravel-team/D… A declarative way to do prompt engineering in Clojure. It is inspired by DSPy library in Python. In it’s current shape API looks very similar to instructor-clj right now. But next up DSCloj will have optimisers too. A few things coming up next, - Observability integration with Otel - Prompt optimisers with REPL-first API - EDN compatible serialisation for modules. It will be handy to save optimised modules PS - It is such a joy building things with Clojure. I have been writing Python with DSPy and building similar use-case in Clojure is just simple.

English
4
12
52
7.5K