176 posts

KD

@Reveur_7

PhD @Berkeley_ai | CEO @ Embodied Science Alum @CarnegieMellon | Ex. Principal SWE #Mamba4Life

In your dreams Katılım Aralık 2018

287 Takip Edilen124 Takipçiler

KD retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·20 May

Paper: arxiv.org/abs/2605.19633 Blog: gepa-ai.github.io/gepa/blog/2026…

English

671

KD retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·20 May

Our paper on optimize_anything has been accepted to CAIS 2026, and is out on Arxiv with expanded experiments and details! A unified API to optimize agents (with architecture), CUDA kernels, cloud scheduling policies, or even graphics! x.com/LakshyAAAgrawa…

Lakshya A Agrawal@LakshyAAAgrawal

Excited to release @gepa_ai's optimize_anything: a universal API for optimizing any text parameter. It consistently matches or outperforms domain-specific tools optimizing code, prompts, agent harnesses, cloud policies, even visuals! If you can measure it, you can optimize it.

English

177

22.4K

KD@Reveur_7·14 May

Massive nerf but I knew it was coming. That's why omar.tech and github.com/KE7/helix support several coding agents; not just Claude Code. Switch your defaults over with just one line change each 😃 Thanks @sama and @OpenAI for codex. GPT 5.5 has been killing it!

ClaudeDevs@ClaudeDevs

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

English

124

KD@Reveur_7·14 May

@ClaudeDevs I'm on a max 20x plan yet my weekly usage number is the exact same?

English

ClaudeDevs@ClaudeDevs·13 May

Claude Code weekly limits are increasing 50%, now through July 13. Live now for all Pro, Max, Team, and seat-based Enterprise users.

English

1.4K

2.1K

22.4K

2.7M

KD retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·13 May

Learning from rich textual feedback (errors, traces, partial reasoning) beats scalar reward alone for LLM optimization. GEPA demonstrated this for context-space optimization (prompts and agent harnesses), delivering frontier results at a fraction of the cost of RL. But context-only optimization is bounded by the base model's capability ceiling; weight updates can reach further. Very excited about this new line of work on Fast-Slow Training (FST), which interleaves context and model weight optimization! The idea is a clean division of labor between two interleaved loops: 🔹 Fast loop (context): GEPA reads rich rollout feedback updating the context layer. The context becomes a fast-updating scratchpad of what the model needs to know about this task, right now. 🔹 Slow loop (model parameters): RL updates the model's parameters conditioned on the evolving context. Because the prompt already carries task-specific nuances, the model parameters are freed from absorbing them and focus on what actually generalizes across tasks and pushes the frontier. ⦁ 3× more sample-efficient than RL on math, code, and physics reasoning ⦁ ~70% lower KL divergence from base at matched accuracy ⦁ Plasticity preserved: FST checkpoints respond better to additional RL on new tasks than RL-only ones ⦁ Continual learning across changing tasks (HoVer → CodeIO → Physics) where RL stalls the moment the task switches FST is a direction towards: ⦁ Addressing RL's pain points: entropy collapse, sparse rewards, long-horizon exploration ⦁ Providing a clean channel for rich feedback into weight updates ⦁ Demonstrating model-harness co-evolution ⦁ Discovery: Using fast context updates for broad exploration, while leveraging a continually improving model. Check out the full thread below:

Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English

186

33.1K

KD@Reveur_7·10 May

Hey! That’s the CEO of Embodied Science 🤯

CSGE@berkeley_csge

Berkeley PhD students and Postdoc sharing about their startup

English

310

KD retweetledi

CSGE@berkeley_csge·9 May

Tonight’s panel getting started with a “spicy” first question: how much did you raised?

English

1.4K

KD@Reveur_7·5 May

Mo Models Mo Benchmarks - Not by The Notorious B.I.G.

English

KD retweetledi

Parth Asawa@pgasawa·4 May

Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings. Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened. But deployed AI systems should learn from experience. We tested 10+ frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)

English

153

1.1K

825.3K

KD retweetledi

CSGE@berkeley_csge·25 Nis

Berkeley CS Graduate Entrepreneurs (CSGE) is back with the annual Spring Mixer on May 8th! 🌉 Join us for a night where research meets startups, featuring an exciting panel with @sarahookr, @ericzelikman, and @NaveenGRao! RSVP early to save your spot: luma.com/nwca4b85

English

1.4K

KD@Reveur_7·28 Nis

1. HELIX 2. GRAID & Embodied Science

Ronak Malde@rronak_

My takeaways from ICLR 2026 1. Recursive self improvement / continual learning is the next frontier of research. Several great papers in self distillation, auto agent harness optimization, learning from non verifiable reward, self-play are sarly signs of success 2. Multimodal models and world models are attaining emergent reasoning capabilities, opening up a near door to spatial understanding that was previously locked 3. Lots of concerns that the research community is currently too focused on benchmaxxing rather than improving the research process, and a call to action to address this, like Percy Liang’s fully open source training community. 4. Rio is possibly even better than San Diego 🇧🇷🏄

English

KD retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·23 Nis

I am incredibly grateful to have had the opportunity to collaborate with and learn from a wonderful team consisting of @ShangyinT @dilarafsoylu @NoahZiems @lukedhlee @wenjie_ma @reveur_7 @kristahopsalong @arnav_thebigman @krypticmouse @michaelryan207 Sanjit Seshia @Meng_CS @ChrisGPotts @koushik77 @AlexGDimakis @profjoeyg @istoica05 Dan Klein @matei_zaharia @lateinteraction. I thank the incredible community members who continue to adopt, provide feedback as well as directly contribute to the GEPA project.

English

952

KD@Reveur_7·23 Nis

@OpenRouter @TencentHunyuan For Terminal-Bench 2.0, what's the "agent" that was used?

English

243

OpenRouter@OpenRouter·23 Nis

The new Hy3-Preview model from @TencentHunyuan is live for free on OpenRouter! It’s a 295B MoE model (21B active) with controllable reasoning effort. A cost-effective, practical model that performs strongly in coding agents and delivers comprehensive general capabilities.

English

343

46.6K

KD@Reveur_7·22 Nis

I don't know why people are debating Codex vs Claude Code. Just use OMAR and you can have both running simultaneously! + Cursor, Gemini, and OpenCode too omar.tech/blog/introduci…

English

124

KD@Reveur_7·22 Nis

@AnthropicAI I'm just trying to push out open source projects for the world. Please reset my usage 🙏🏽 And while I have your attention, check out OMAR & HELIX! They're also the projects that consume all my tokens 😁 omar.tech/blog/introduci… github.com/KE7/helix

English

KD@Reveur_7·22 Nis

@karpathy you might be interested in this

English

KD@Reveur_7·21 Nis

For a gentler introduction to OMAR, read our new blogpost: omar.tech/blog/introduci…

English

KD@Reveur_7·21 Nis

What if one person could run a unicorn company? Today we're open-sourcing OMAR — a TUI that lets a single engineer orchestrate hundreds of AI coding agents in deep, recursive hierarchies. Built at Berkeley. Powered by tmux. github.com/lsk567/omar 🧵

English

2.6K

Keşfet

@sama @OpenAI @ClaudeDevs @sarahookr @ericzelikman @NaveenGRao @ShangyinT @dilarafsoylu