UC Berkeley Sky

81 posts

UC Berkeley Sky

@BerkeleySky

Sky Computing - looking for the Berkeley Skydeck? They’re on the other side of Campus from us @SkyDeck_Cal.

Berkeley, CA Katılım Kasım 2021

24 Takip Edilen1.4K Takipçiler

UC Berkeley Sky retweetledi

Melissa Pan@melissapan·2d

Excited to share that MAP has been selected for ✨ICML Oral✨ We look forward to sharing the insights in the paper with the community And much much appreciations to everyone who participated in our study ❤️ MAP won’t be possible without your contribution to open science

Melissa Pan@melissapan

Excited to share: MAP has been accepted as 🌟 ICML Spotlight 🌟 We hope MAP can provide data-driven insights that help the communities to work on various under-explored research directions around agent systems! Huge thanks & congrats to my amazing co-authors. See you all at Seoul! 🫡

English

164

26K

UC Berkeley Sky retweetledi

Qiuyang Mang@MangQiuyang·15 May

Open-ended coding training data may no longer be the bottleneck: AI can scale open-ended tasks—and even outperform human-expert curation. FrontierCS team is releasing FrontierSmith: a system for synthesizing open-ended coding problems at scale. Starting from closed-ended coding tasks, FrontierSmith mutates, filters, and builds runnable optimization environments for long-horizon coding agents. In our experiments, FrontierSmith data trains stronger models than human-curated open-ended data on FrontierCS and ALE-bench. Blog: frontier-cs.org/blog/frontiers… Paper: arxiv.org/abs/2605.14445 Code: github.com/FrontierCS/Fro… Model: huggingface.co/runyuanhe/qwen…

English

331

93.1K

UC Berkeley Sky retweetledi

Ziming Mao@ziming_mao·1d

🚀 Excited to release mKernel: a set of fast multi-node, multi-GPU fused kernels. 💻 Code: github.com/uccl-project/m… 📝 Blog: uccl-project.github.io/posts/mkernel/ mKernel fuses compute + communication into one persistent GPU kernel, covering both intra/inter-node with GPU-initiated communication. Amazing team: @yangzhouy, Chon Lam Lao, Costin Raiciu, Scott Shenker, @istoica05

English

381

31.5K

UC Berkeley Sky retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·13 May

Learning from rich textual feedback (errors, traces, partial reasoning) beats scalar reward alone for LLM optimization. GEPA demonstrated this for context-space optimization (prompts and agent harnesses), delivering frontier results at a fraction of the cost of RL. But context-only optimization is bounded by the base model's capability ceiling; weight updates can reach further. Very excited about this new line of work on Fast-Slow Training (FST), which interleaves context and model weight optimization! The idea is a clean division of labor between two interleaved loops: 🔹 Fast loop (context): GEPA reads rich rollout feedback updating the context layer. The context becomes a fast-updating scratchpad of what the model needs to know about this task, right now. 🔹 Slow loop (model parameters): RL updates the model's parameters conditioned on the evolving context. Because the prompt already carries task-specific nuances, the model parameters are freed from absorbing them and focus on what actually generalizes across tasks and pushes the frontier. ⦁ 3× more sample-efficient than RL on math, code, and physics reasoning ⦁ ~70% lower KL divergence from base at matched accuracy ⦁ Plasticity preserved: FST checkpoints respond better to additional RL on new tasks than RL-only ones ⦁ Continual learning across changing tasks (HoVer → CodeIO → Physics) where RL stalls the moment the task switches FST is a direction towards: ⦁ Addressing RL's pain points: entropy collapse, sparse rewards, long-horizon exploration ⦁ Providing a clean channel for rich feedback into weight updates ⦁ Demonstrating model-harness co-evolution ⦁ Discovery: Using fast context updates for broad exploration, while leveraging a continually improving model. Check out the full thread below:

Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

English

186

33.1K

UC Berkeley Sky retweetledi

Negar Arabzadeh@NegarEmpr·12 May

1/ Thrilled to introduce T³: a corpus for RAG over reasoning tasks, built from thinking traces. We show that surprisingly RAG can improve reasoning— with the right corpus. Rag with Transformed Thinking Traces T³ gain by up to 43.9% on AIME 2025-2026. 🔗 arxiv.org/abs/2605.03344 🧵

English

212

472.2K

UC Berkeley Sky retweetledi

Parth Asawa@pgasawa·4 May

Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings. Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened. But deployed AI systems should learn from experience. We tested 10+ frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)

English

153

1.1K

825.2K

UC Berkeley Sky retweetledi

Yiwei Hou@yiwei_hou·1 May

Agent harness is as important as the model for cybersecurity. $300 in compute, 9 OSS-Fuzz projects, 14 security issues and 5 CVEs. The key lesson: you don’t need a secret model to find real security issues. You need an effective, affordable, reliable harness. 5 takeaways 🧵

English

1.4K

UC Berkeley Sky retweetledi

Qiuyang Mang@MangQiuyang·30 Nis

Excited to announce that FrontierCS has been accepted to ICML 2026! 🚀 We are scaling our open-ended task set to 250 tasks (100 new tasks in 2026 Q1🔥), featuring long-horizon agent settings in Harbor and integration into real-world human contests. More exciting updates to come! Huge thanks to all our collaborators. #ICML2026 #AI #MachineLearning

Huanzhi Mao@HuanzhiMao

Pass/fail benchmarks are saturated. It’s time for FrontierCS. 🚀 150+ unsolved, verifiable problems ranging from competitive programming to real-world research. Designed by PhDs & ICPC experts to evolve model intelligence. 🎓🧠 🧵👇Check it out! Paper: arxiv.org/abs/2512.15699

English

6.5K

UC Berkeley Sky retweetledi

Melissa Pan@melissapan·30 Nis

English

231

55.3K

UC Berkeley Sky retweetledi

KD@Reveur_7·21 Nis

What if one person could run a unicorn company? Today we're open-sourcing OMAR — a TUI that lets a single engineer orchestrate hundreds of AI coding agents in deep, recursive hierarchies. Built at Berkeley. Powered by tmux. github.com/lsk567/omar 🧵

English

2.6K

UC Berkeley Sky retweetledi

Abby O'Neill@abby_k_oneill·29 Nis

Would you trust an AI agent to negotiate on your country's behalf at the G20? Real coordination is long-horizon, asymmetric, and non-binding; current multi-agent evaluations miss this. We build Cooperate to Compete (C2C): a testbed for LM agents coordinating with rivals. 🤝🔪🎭

English

26.5K

UC Berkeley Sky retweetledi

Berkeley Computing, Data Science, and Society@BerkeleyCDSS·8 Nis

Congratulations to Matei Zaharia on being awarded the ACM Prize in Computing! His development of open-source systems helped enable large-scale machine learning, analytics and AI at a global scale. @matei_zaharia @UCBerkeley 🔗 Read more: bit.ly/4vbNujK

Berkeley Computing, Data Science, and Society tweet media

English

7.6K

UC Berkeley Sky retweetledi

AI-Driven Research for Systems@ai4research_ucb·2 Nis

🎯 One Year of AI-Driven Research at Berkeley [ADRS Blog #20] For the past year at Berkeley, we have been working on automating discovery with AI. In our blog post this week, we provide an overview of these efforts: the key problems we’re tackling, the frameworks and solutions we’ve built so far, and how these efforts fit into a broader vision for AI-driven scientific discovery. ✍️ Read the blog: ucbskyadrs.github.io/blog/berkeley-… 📖 ADRS Blog Series: ucbskyadrs.github.io

AI-Driven Research for Systems tweet media

English

23.2K

UC Berkeley Sky retweetledi

Mayank Mishra@MayankMish98·19 Mar

Introducing M²RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling We bring back non-linear recurrence to language modeling and show it's been held back by small state sizes, not by non-linearity itself. 📄 Paper: arxiv.org/abs/2603.14360 💻 Code: github.com/open-lm-engine… 🤗 Models: huggingface.co/collections/op…

English

109

514

146.9K

UC Berkeley Sky retweetledi

Shu Lynn Liu@shulynnliu·17 Mar

Researchers spend hours and hours hand-crafting the strategies behind LLM-driven optimization systems like AlphaEvolve: deciding which ideas to reuse, when to explore vs exploit, and what mutations to try. 🤖But what if AI could evolve its own evolution process? We introduce EvoX, a meta-evolution pipeline that lets AI evolve the strategy guiding the optimization. It achieves high-quality solutions for <$5, while existing open systems and even Claude Code often cost 3-5× more on some tasks. Across ~200 optimization problems, EvoX delivers the strongest overall results: often outperforming AlphaEvolve, OpenEvolve, GEPA, and ShinkaEvolve on math and systems tasks, exceeding human SOTA, and improving median performance by up to 61% on 172 competitive programming problems. 👇

English

498

99.3K

UC Berkeley Sky retweetledi

Ion Stoica@istoica05·16 Mar

@karpathy Very nice results and great project! Sharing some of our experience with similar agentic frameworks at UC Berkeley: ADRS blog series: ucbskyadrs.github.io/blog/ GEPA: github.com/gepa-ai/gepa KISS: github.com/ksenxx/kiss_ai

English

115

10.1K

UC Berkeley Sky retweetledi

Shu Lynn Liu@shulynnliu·3 Mar

AlphaEvolve is closed-source. We release 🌟SkyDiscover🌟, a flexible, modular open-source framework with two new adaptive algorithms that match or exceed AlphaEvolve on many benchmarks and outperform OpenEvolve, GEPA, and ShinkaEvolve across 200+ optimization tasks. Our new algorithms dynamically adapt their search strategy, and can even let the AI optimize its own optimization process on the fly! Results: 📊 +34% median score improvement on 172 Frontier-CS problems. 🧮 Matches/exceeds AlphaEvolve on many math benchmarks ⚙️ Discovers system optimizations beyond human-designed SOTA 🧵👇

GIF

English

105

582

141.7K

UC Berkeley Sky retweetledi

Mayank Mishra@MayankMish98·26 Şub

We identified an issue with the Mamba-2 🐍 initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized). This bug is related to 2 main issues: 1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: github.com/fla-org/flash-…). 2. Skipping initialization due to meta device init for DTensors with FSDP-2 (github.com/fla-org/flash-… will fix this issue upon merging). The difference is substantial. Mamba-2 seems to be quite sensitive to the initialization. Check out our experiments at the 7B MoE scale: wandb.ai/mayank31398/ma… Special thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu 🙏 Also thanks to @SonglinYang4 for quickly helping in merging the PR.

English

745

371.4K

UC Berkeley Sky retweetledi

Laude Institute@LaudeInstitute·26 Şub

Introducing Slingshots // TWO: Research that ships. 14 projects, six institutions – let’s meet the batch 🧵

English

23.7K

UC Berkeley Sky retweetledi

NovaSky@NovaSkyAI·13 Şub

We are excited to announce that SkyRL now implements the Tinker API. Run Tinker training scripts on your own hardware with zero code changes. Try it out today: novasky-ai.notion.site/skyrl-tinker

Tyler Griggs@tyler_griggs_

SkyRL now implements the Tinker API. Now, training scripts written for Tinker can run on your own GPUs with zero code changes using SkyRL's FSDP2, Megatron, and vLLM backends. Blog: novasky-ai.notion.site/skyrl-tinker 🧵

English

2.1K

Keşfet

@yangzhouy @istoica05 @matei_zaharia @UCBerkeley @karpathy @kevinyli_ @bharatrunwal2 @HanGuo97