UC Berkeley Sky

70 posts

UC Berkeley Sky banner
UC Berkeley Sky

UC Berkeley Sky

@BerkeleySky

Sky Computing - looking for the Berkeley Skydeck? They’re on the other side of Campus from us @SkyDeck_Cal.

Berkeley, CA Katılım Kasım 2021
24 Takip Edilen1.3K Takipçiler
UC Berkeley Sky retweetledi
Shu Lynn Liu
Shu Lynn Liu@shulynnliu·
Researchers spend hours and hours hand-crafting the strategies behind LLM-driven optimization systems like AlphaEvolve: deciding which ideas to reuse, when to explore vs exploit, and what mutations to try. 🤖But what if AI could evolve its own evolution process? We introduce EvoX, a meta-evolution pipeline that lets AI evolve the strategy guiding the optimization. It achieves high-quality solutions for <$5, while existing open systems and even Claude Code often cost 3-5× more on some tasks. Across ~200 optimization problems, EvoX delivers the strongest overall results: often outperforming AlphaEvolve, OpenEvolve, GEPA, and ShinkaEvolve on math and systems tasks, exceeding human SOTA, and improving median performance by up to 61% on 172 competitive programming problems. 👇
Shu Lynn Liu tweet media
English
18
86
490
89.2K
UC Berkeley Sky retweetledi
Shu Lynn Liu
Shu Lynn Liu@shulynnliu·
AlphaEvolve is closed-source. We release 🌟SkyDiscover🌟, a flexible, modular open-source framework with two new adaptive algorithms that match or exceed AlphaEvolve on many benchmarks and outperform OpenEvolve, GEPA, and ShinkaEvolve across 200+ optimization tasks. Our new algorithms dynamically adapt their search strategy, and can even let the AI optimize its own optimization process on the fly! Results: 📊 +34% median score improvement on 172 Frontier-CS problems. 🧮 Matches/exceeds AlphaEvolve on many math benchmarks ⚙️ Discovers system optimizations beyond human-designed SOTA 🧵👇
GIF
English
12
107
579
138.2K
UC Berkeley Sky retweetledi
Mayank Mishra
Mayank Mishra@MayankMish98·
We identified an issue with the Mamba-2 🐍 initialization in HuggingFace and FlashLinearAttention repository (dt_bias being incorrectly initialized). This bug is related to 2 main issues: 1. init being incorrect (torch.ones) if Mamba-2 layers are used in isolation without the Mamba2ForCausalLM model class (this has been already fixed: github.com/fla-org/flash-…). 2. Skipping initialization due to meta device init for DTensors with FSDP-2 (github.com/fla-org/flash-… will fix this issue upon merging). The difference is substantial. Mamba-2 seems to be quite sensitive to the initialization. Check out our experiments at the 7B MoE scale: wandb.ai/mayank31398/ma… Special thanks to @kevinyli_, @bharatrunwal2, @HanGuo97, @tri_dao and @_albertgu 🙏 Also thanks to @SonglinYang4 for quickly helping in merging the PR.
English
17
73
747
367.9K
UC Berkeley Sky retweetledi
Laude Institute
Laude Institute@LaudeInstitute·
Introducing Slingshots // TWO: Research that ships. 14 projects, six institutions – let’s meet the batch 🧵
Laude Institute tweet media
English
5
15
71
22.5K
UC Berkeley Sky retweetledi
NovaSky
NovaSky@NovaSkyAI·
We are excited to announce that SkyRL now implements the Tinker API. Run Tinker training scripts on your own hardware with zero code changes. Try it out today: novasky-ai.notion.site/skyrl-tinker
Tyler Griggs@tyler_griggs_

SkyRL now implements the Tinker API. Now, training scripts written for Tinker can run on your own GPUs with zero code changes using SkyRL's FSDP2, Megatron, and vLLM backends. Blog: novasky-ai.notion.site/skyrl-tinker 🧵

English
0
4
27
1.8K
UC Berkeley Sky retweetledi
AI-Driven Research for Systems
AI-Driven Research for Systems@ai4research_ucb·
🎯 AI evolves better multi-agent reasoning systems, boosting accuracy on Math Olympiad problems [ADRS Blog #14] In our latest post, we demonstrate that hill-climbing on accuracy alone leads to brittle heuristics. By shifting from Binary Feedback to more granular feedback with MAST, we provide the evolution optimizer with a dense diagnostic signal, enabling the discovery of robust multi-agent architecture that generalize.
AI-Driven Research for Systems tweet media
English
1
8
25
19.6K
UC Berkeley Sky retweetledi
Tyler Griggs
Tyler Griggs@tyler_griggs_·
SkyRL now implements the Tinker API. Now, training scripts written for Tinker can run on your own GPUs with zero code changes using SkyRL's FSDP2, Megatron, and vLLM backends. Blog: novasky-ai.notion.site/skyrl-tinker 🧵
Tyler Griggs tweet media
English
6
54
234
55.3K
UC Berkeley Sky retweetledi
Koushik Sen
Koushik Sen@koushik77·
What if your AI agents could evolve themselves—getting smarter, faster, and cheaper with each generation? We've been doing prompt engineering wrong. Spending hours crafting the perfect system prompt, tweaking instructions, adding examples... only to do it all over again when the model updates. But here's the thing: prompt optimization only scratches the surface. Your agent's efficiency isn't just about the words in your prompts. It's about: - How your orchestrator delegates to sub-agents - When you batch operations vs. run them sequentially - Which tools do you create dynamically vs. hardcode - How your checkpointing strategy affects recovery time This is code optimization, not prompt optimization. So we built Agent Evolver—a system that breeds better AI agents through genetic evolution. How it works: - Seed - Generate initial agents using state-of-the-art Claude Code, Gemini CLI, or OpenAI Codex agents. - Library - The agents can use the KISS Api or the above state-of-the-art agents. - Mutate & Crossover - Apply evolutionary operations across generations - Pareto Selection - Maintain a frontier of non-dominated solutions optimizing for BOTH cost AND speed Stop tuning prompts. Start evolving agents. Read the blog at: dev.to/koushik_sen_d5… Try it yourself (it's open source): github.com/ksenxx/kiss_ai
English
0
3
13
684
UC Berkeley Sky retweetledi
Zirui "Colin" Wang
Zirui "Colin" Wang@zwcolin·
🎮 We release VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents (w/ @junyi42 @aomaru_21490) 🌐 With 17 environments across multiple domains, we show systematically the brittleness of VLMs in visual interaction, and what training leads to. 🧵[1/8]
English
2
34
181
39.6K
UC Berkeley Sky retweetledi
Woosuk Kwon
Woosuk Kwon@woosuk_k·
Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team
Woosuk Kwon tweet media
English
177
126
1.1K
464.3K
UC Berkeley Sky retweetledi
Koushik Sen
Koushik Sen@koushik77·
I'm excited to announce the public release of KISS Agent Framework — an open-source AI agent framework built on one principle: Keep It Simple, Stupid. Since the API is stupidly simple, I can quickly vibe code up agents and evolutionary algorithms just by providing the README dot md and a new idea. More in the blog post at lnkd.in/gaX3dZ7Q. After a month of development, I'm making this framework available to the community. Here's what makes KISS different: 🎯 Simple Architecture A clean ReAct loop implementation that you can understand in minutes, not hours. No hidden complexity, no black boxes. 🔧 Native Function Calling Seamless tool integration with OpenAI, Anthropic, Gemini, Together AI, and OpenRouter (400+ models). Your tools just work. 🧬 GEPA: Prompt Evolution Genetic-Pareto optimization that evolves your prompts through natural language reflection. Based on recent research showing this can outperform RL. 🔬 KISSEvolve: Algorithm Discovery LLM-guided mutation and crossover for evolving code. We've used this to discover faster sorting algorithms from bubble sort. 📊 Built-in Observability Automatic token & budget tracking Trajectory saving & visualization Docker isolation for safe execution The framework supports: SWE-bench Verified benchmarks onboarded AlgoTune Verified benchmarks onboarded RAG with in-memory vector search Multiprocessing for parallel execution Why open source this? Because the AI agent ecosystem needs more tools that prioritize clarity over cleverness. If you can't understand how your agent works, you can't debug it, improve it, or trust it. Check it out: github.com/ksenxx/kiss_ai I'd love feedback from the community. What features would you find most useful? #AI #MachineLearning #OpenSource #LLM #AgentFramework #Python #ArtificialIntelligence #SoftwareEngineering
English
4
11
78
6.8K
UC Berkeley Sky retweetledi
SkyLight
SkyLight@skylight_org·
We’ve been assuming that extreme attention sparsity must destroy model quality. That assumption is now false. New SkyLight Release: vAttention ⚡️ We added vAttention to the Tier1A leaderboard. The data is striking: 👑 #1 Practical Method: Surpasses the previous lead (PQCache), closing the gap to dense models within 1% at up to 10x sparsity.. 👑 Saturates Benchmarks: Dominates the sparsity–quality frontier (w/ oracle top-k). 💡 Introducing Verified-X: A new paradigm for reliable inference-time sparsity. In this week’s blog, we share an in-depth look at the algorithm and how to run it via SkyLight 👇
SkyLight tweet media
English
1
8
15
3.7K
UC Berkeley Sky retweetledi
Mayank Mishra
Mayank Mishra@MayankMish98·
We cooked!🚀🚀🚀 Releasing SonicMoE: a fast MoE implementation for NVIDIA H100s GPUs. Special thanks to my collaborators: @WentaoGuo7 @XinleC295 @istoica05 and @tri_dao from whom I learnt a lot!
Wentao Guo@WentaoGuo7

🚀SonicMoE🚀: a blazingly-fast MoE implementation optimized for NVIDIA Hopper GPUs. SonicMoE reduces activation memory by 45% and is 1.86x faster on H100 than previous SOTA😃 Paper: arxiv.org/abs/2512.14080 Work with @MayankMish98, @XinleC295, @istoica05, @tri_dao

English
7
24
168
45.8K
UC Berkeley Sky retweetledi
Jintao Zhang
Jintao Zhang@Jintao_Zhang_·
TurboDiffusion: 100–205× faster video generation on a single RTX 5090 🚀 Only takes 1.8s to generate a high-quality 5-second video. The key to both high speed and high quality? 😍SageAttention + Sparse-Linear Attention (SLA) + rCM Github: github.com/thu-ml/TurboDi… Technical Report: jt-zhang.github.io/files/TurboDif…
English
29
168
869
118K
UC Berkeley Sky retweetledi
Lisa Dunlap
Lisa Dunlap@lisabdunlap·
🧵Tired of scrolling through your horribly long model traces in VSCode to figure out why your model failed? We made StringSight to fix this: an automated pipeline for analyzing your model outputs at scale. ➡️Demo: stringsight.com ➡️Blog: blog.stringsight.com
English
3
37
91
27K
UC Berkeley Sky retweetledi
Yichuan Wang
Yichuan Wang@YichuanM·
(1/N) 🚀 DS-Serve is a framework for efficient, scalable neural retrieval — it turns any in-house dataset (<1T tokens) into a high-throughput (up to 10,000 QPS), low-latency (<100ms), memory-efficient (<200GB RAM) retrieval system with a web UI and API. With DS-Serve, we publicly deployed a 400B-token datastore of high-quality LLM pretraining data (2B vectors), spanning academic resources — and it matches commercial search endpoints on our benchmarks at extremely low latency and high throughput. Try it out: api.ds-serve.org:30888/ui Blog: berkeley-large-rag.github.io/RAG-DS-Serve Work from UC Berkeley ( @BerkeleyNLP & @BerkeleySky) with collaborators at UW & UIUC!
GIF
English
5
53
172
63.5K
UC Berkeley Sky retweetledi
AI-Driven Research for Systems
AI-Driven Research for Systems@ai4research_ucb·
🎯 AI agents generate production-ready GPU kernels that outperform compiled models [ADRS Blog #6] This week, we feature work from @datadoghq on BitsEvolve: an ADRS framework for systems optimization. Using evolutionary search, we show how AI can automate the generation of custom, high-performance GPU kernels! ✍️ Read the blog: adrs-ucb.notion.site/datadog 🚀 Previous BitsEvolve Post: datadoghq.com/blog/engineeri… 📄 ADRS Paper: arxiv.org/abs/2510.06189 👩‍💻 Code: github.com/UCB-ADRS/ADRS 💬Join the community: join.slack.com/t/adrs-global/…
AI-Driven Research for Systems tweet media
English
0
16
56
34.7K
UC Berkeley Sky retweetledi
Melissa Pan
Melissa Pan@melissapan·
Thrilled to release our new paper MAP: Measuring Agents in Production ⚙️🚀 2025 is the year of agents… but do they actually work in the real world? Is it just hype? A group of 25 researchers from Berkeley, Stanford, UIUC, IBM, and Intesa Sanpaolo investigated what makes agents deployable in the wild. So… 📈 Why agents? Productivity gains ➕ How to build production agents? Simple & controllable methods 🧑‍💻 How to evaluate agents? Heavy human oversight 🛑 Top challenge now? Reliability remains unsolved We surveyed 306 agent builders and ran 20 in-depth interviews across 26 agent application domains to understand the current landscape of production agents. Check out our latest paper: MAP - more in the thread 👇 (1/N)
Melissa Pan tweet media
English
20
108
511
194.4K