KD

176 posts

KD banner
KD

KD

@Reveur_7

PhD @Berkeley_ai | CEO @ Embodied Science Alum @CarnegieMellon | Ex. Principal SWE #Mamba4Life

In your dreams Katılım Aralık 2018
287 Takip Edilen124 Takipçiler
KD retweetledi
Lakshya A Agrawal
Lakshya A Agrawal@LakshyAAAgrawal·
Our paper on optimize_anything has been accepted to CAIS 2026, and is out on Arxiv with expanded experiments and details! A unified API to optimize agents (with architecture), CUDA kernels, cloud scheduling policies, or even graphics! x.com/LakshyAAAgrawa…
Lakshya A Agrawal tweet media
Lakshya A Agrawal@LakshyAAAgrawal

Excited to release @gepa_ai's optimize_anything: a universal API for optimizing any text parameter. It consistently matches or outperforms domain-specific tools optimizing code, prompts, agent harnesses, cloud policies, even visuals! If you can measure it, you can optimize it.

English
4
22
177
22.4K
KD
KD@Reveur_7·
Massive nerf but I knew it was coming. That's why omar.tech and github.com/KE7/helix support several coding agents; not just Claude Code. Switch your defaults over with just one line change each 😃 Thanks @sama and @OpenAI for codex. GPT 5.5 has been killing it!
ClaudeDevs@ClaudeDevs

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

English
0
0
0
124
KD
KD@Reveur_7·
@ClaudeDevs I'm on a max 20x plan yet my weekly usage number is the exact same?
English
0
0
0
22
ClaudeDevs
ClaudeDevs@ClaudeDevs·
Claude Code weekly limits are increasing 50%, now through July 13. Live now for all Pro, Max, Team, and seat-based Enterprise users.
ClaudeDevs tweet media
English
1.4K
2.1K
22.4K
2.7M
KD retweetledi
Lakshya A Agrawal
Lakshya A Agrawal@LakshyAAAgrawal·
Learning from rich textual feedback (errors, traces, partial reasoning) beats scalar reward alone for LLM optimization. GEPA demonstrated this for context-space optimization (prompts and agent harnesses), delivering frontier results at a fraction of the cost of RL. But context-only optimization is bounded by the base model's capability ceiling; weight updates can reach further. Very excited about this new line of work on Fast-Slow Training (FST), which interleaves context and model weight optimization! The idea is a clean division of labor between two interleaved loops: 🔹 Fast loop (context): GEPA reads rich rollout feedback updating the context layer. The context becomes a fast-updating scratchpad of what the model needs to know about this task, right now. 🔹 Slow loop (model parameters): RL updates the model's parameters conditioned on the evolving context. Because the prompt already carries task-specific nuances, the model parameters are freed from absorbing them and focus on what actually generalizes across tasks and pushes the frontier. ⦁ 3× more sample-efficient than RL on math, code, and physics reasoning ⦁ ~70% lower KL divergence from base at matched accuracy ⦁ Plasticity preserved: FST checkpoints respond better to additional RL on new tasks than RL-only ones ⦁ Continual learning across changing tasks (HoVer → CodeIO → Physics) where RL stalls the moment the task switches FST is a direction towards: ⦁ Addressing RL's pain points: entropy collapse, sparse rewards, long-horizon exploration ⦁ Providing a clean channel for rich feedback into weight updates ⦁ Demonstrating model-harness co-evolution ⦁ Discovery: Using fast context updates for broad exploration, while leveraging a continually improving model. Check out the full thread below:
Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English
13
43
186
33.1K
KD retweetledi
CSGE
CSGE@berkeley_csge·
Tonight’s panel getting started with a “spicy” first question: how much did you raised?
CSGE tweet mediaCSGE tweet mediaCSGE tweet mediaCSGE tweet media
English
1
3
13
1.4K
KD
KD@Reveur_7·
Mo Models Mo Benchmarks - Not by The Notorious B.I.G.
English
0
0
2
50
KD retweetledi
Parth Asawa
Parth Asawa@pgasawa·
Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings. Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened. But deployed AI systems should learn from experience. We tested 10+ frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)
Parth Asawa tweet media
English
42
153
1.1K
825.3K
KD retweetledi
CSGE
CSGE@berkeley_csge·
Berkeley CS Graduate Entrepreneurs (CSGE) is back with the annual Spring Mixer on May 8th! 🌉 Join us for a night where research meets startups, featuring an exciting panel with @sarahookr, @ericzelikman, and @NaveenGRao! RSVP early to save your spot: luma.com/nwca4b85
English
1
9
12
1.4K
KD retweetledi
Lakshya A Agrawal
Lakshya A Agrawal@LakshyAAAgrawal·
I am incredibly grateful to have had the opportunity to collaborate with and learn from a wonderful team consisting of @ShangyinT @dilarafsoylu @NoahZiems @lukedhlee @wenjie_ma @reveur_7 @kristahopsalong @arnav_thebigman @krypticmouse @michaelryan207 Sanjit Seshia @Meng_CS @ChrisGPotts @koushik77 @AlexGDimakis @profjoeyg @istoica05 Dan Klein @matei_zaharia @lateinteraction. I thank the incredible community members who continue to adopt, provide feedback as well as directly contribute to the GEPA project.
English
2
2
17
952
OpenRouter
OpenRouter@OpenRouter·
The new Hy3-Preview model from @TencentHunyuan is live for free on OpenRouter! It’s a 295B MoE model (21B active) with controllable reasoning effort. A cost-effective, practical model that performs strongly in coding agents and delivers comprehensive general capabilities.
OpenRouter tweet media
English
13
19
343
46.6K
KD
KD@Reveur_7·
I don't know why people are debating Codex vs Claude Code. Just use OMAR and you can have both running simultaneously! + Cursor, Gemini, and OpenCode too omar.tech/blog/introduci…
English
0
0
1
124
KD
KD@Reveur_7·
@karpathy you might be interested in this
English
0
0
0
28
KD
KD@Reveur_7·
What if one person could run a unicorn company? Today we're open-sourcing OMAR — a TUI that lets a single engineer orchestrate hundreds of AI coding agents in deep, recursive hierarchies. Built at Berkeley. Powered by tmux. github.com/lsk567/omar 🧵
English
1
4
15
2.6K