Jon Saad-Falcon

389 posts

Jon Saad-Falcon

@JonSaadFalcon

CS PhD @hazyresearch @stanfordnlp @StanfordAILab

Palo Alto, CA Katılım Ocak 2021

936 Takip Edilen1.8K Takipçiler

Sabitlenmiş Tweet

Jon Saad-Falcon@JonSaadFalcon·12 Kas

Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW): intelligence delivered (capabilities) per unit of power consumed (efficiency). Today’s Local LMs already handle 88.7% of single-turn chat and reasoning queries, with local IPW improving 5.3× in 2 years—driven by better models (3.2×) and better accelerators (1.7×). As local IPW improves, a meaningful fraction of workloads can shift from centralized infrastructure to local compute, with IPW serving as the critical metric for tracking this transition. (1/N)

English

142

456

227.7K

Jon Saad-Falcon retweetledi

Avanika Narayan@Avanika15·4d

“the edge + cloud are going to come together” — the 🐐 @Benioff @Benioff knows what’s up 💪🏽. hybrid local-cloud inference is the way. @JonSaadFalcon and I have been working on this for a minute. links to research in comments 👇

Maddy A@its_maddy_a

“I think we are getting brainwashed.” @Benioff said this on @theallinpod. “We’re using $300M of @AnthropicAI this year… the vast majority of those tokens don’t need to go to Anthropic.” Some tasks need @claudeai . Some need @OpenAI . Most need smaller, cheaper, faster models like @ZeroGPU_AI @Benioff believes in what we do - @salesforcevc should take a look. zerogpu.ai

English

2.2K

Jon Saad-Falcon retweetledi

Avanika Narayan@Avanika15·13 May

*bad bitches have fabs

Muhammad Zuhair@mzuhair123

"Real men have fabs" - AMD's founder, and an industry legend Jerry Sanders

English

2.1K

Jon Saad-Falcon retweetledi

Avanika Narayan@Avanika15·11 May

sounds like @OpenJarvisAI running locally on a mac studio, with your other devices connecting to it through whatsapp, telegram, etc 🤷‍♀️ links in comments!

signüll@signulll

imagine if apple basically let you set up a home “server” that ran inference on that device with sophisticated models & every apple ecosystem device is a node off of that central server. it’s complicated but if anyone can deploy hardware like this it would be them. this would allow zero marginal cost ai without any middle man with a super privacy first approach. it would create another product category entirely. anyone remember the airport extreme?

English

1.5K

Jon Saad-Falcon retweetledi

Kelly Buchanan@ekellbuch·7 May

Very excited to release Terminal-Bench 2.1! Coding agents are among the most economically consequential deployments of LLMs to date. As agents improve, benchmark reliability matters more. We audited TB2.0 and found and corrected issues in 28/89 tasks. 30% of the benchmark! But the rankings survived, absolute scores moved up to 12pp!

English

768

84.2K

Jon Saad-Falcon retweetledi

Parth Asawa@pgasawa·4 May

Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings. Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened. But deployed AI systems should learn from experience. We tested 10+ frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)

English

154

1.1K

824.3K

Jon Saad-Falcon retweetledi

Avanika Narayan@Avanika15·4 May

hyped to see computer systems 🐐's like @JeffDean, david patterson, @AzaliaMirh & others discussing how intelligence per watt (ipw) should be the north star metric for computer system design links to event notes + ipw work w/@JonSaadFalcon in comments below!

English

8.5K

Jon Saad-Falcon retweetledi

Avanika Narayan@Avanika15·27 Nis

a lá minions 😊. local by default, hybrid by design 🙌🏽 @JonSaadFalcon and i have been working on this for a minute thru @OpenJarvisAI, minions and intelligence per watt. links in comment 👇

Shay Boloor@StockSavvyShay

OpenAI is reportedly working with $QCOM on an AI-first smartphone targeted for 2028 mass production. The idea is to compete with the $AAPL iPhone by replacing app grids with agent-driven task flows by using on-device AI for context and cloud AI for heavier work.

English

2.4K

Jon Saad-Falcon retweetledi

Avanika Narayan@Avanika15·25 Nis

local compute is the way 💪🏽. the world should run locally by default, calling the cloud only when truly necessary run your agents locally with @OpenJarvisAI and use hybrid local-cloud inference via minions @JonSaadFalcon and i study this extensively in our recent research. links in comments👇

tae kim@firstadopter

OpenAI: "There’s not going to be enough compute in the world to meet the demand."

English

Jon Saad-Falcon retweetledi

Michael Y. Li@michaelyli_·22 Nis

Can a language model learn, end-to-end, what to keep in its own KV cache and what to throw away? Can it learn to forget while it learns to reason? Deep learning's central lesson: capability emerges from end-to-end optimization, not heuristics/strong inductive biases. But for efficiency, we rely heavily on hand-designed approaches. 🗑️ Introducing Neural Garbage Collection (NGC): we train a language model to jointly reason and manage its own KV cache, using reinforcement learning with outcome-based task reward alone. No SFT, no proxy objectives, no summarization in natural language. New paper with @jubayer_hamid, Emily Fox, and @noahdgoodman!

English

133

905

163.1K

Jon Saad-Falcon retweetledi

Avanika Narayan@Avanika15·21 Nis

@OpenJarvisAI is now just one tweet away 😊. file issues, make prs & more, directly from your socials. s/o to the amazing @robbymanihani for the 🚢

Jon Saad-Falcon@JonSaadFalcon

Say hi to @OpenJarvisAI 👋 If you have issues, want to make a PR, or simply chat, just @OpenJarvisAI in a tweet! This account is itself an OpenJarvis instance: running 24/7 on an NVIDIA DGX Spark, triaging issues + PRs on the repo and serving as a personal assistant for the lab! For personal AI on personal devices, checkout: github.com/open-jarvis/Op… x.com/JonSaadFalcon/…

English

768

Jon Saad-Falcon retweetledi

Avanika Narayan@Avanika15·21 Nis

thrilled to see that intelligence per joule (ipj) has become north star metric for hardware-software codesign @JonSaadFalcon and i study ipj extensively in our latest paper. link in comments 👇

Reiner Pope@reinerpope

Intelligence per picojoule, with @itsclivetime and @dylan522p (0:00) Intro (1:22) What is codesign? (2:49) Codesign example: Swish vs ReLU (4:22) Are DeepSeek papers codesign? (6:45) Predicting where ML research will go (8:06) Should researchers hate your chips? (9:34) Can you codesign too much? (13:23) Picking the right grain size for specialization (16:22) How much hardware flexibility for The Age of Research? (20:05) Did reasoning and RL disrupt hardware roadmaps? (23:09) Cerebras/Groq: unexpected wins on reasoning and RL (25:34) Disaggregating MLP and attention (29:06) The right metrics for quantization and codesign papers

English

6.1K

Jon Saad-Falcon@JonSaadFalcon·20 Nis

English

Jon Saad-Falcon retweetledi

Dan Fu@realDanFu·15 Nis

📢 Super excited to announce Parcae! We've been thinking about scaling laws and the "right" way to get more FLOPs. Turns out layer looping - with the right parameterization - gives you a new axis to scale! Parcae matches Transformers 2x their size (w/ the same data), and outperforms prior formulations of looped models. But - you need the right parameterization to get these gains against strong Transformer baselines. Looped models are famously unstable to train, with tons of loss spikes and hyperparameter sensitivity. The main technical challenge with looped models is residual explosion - if you're passing the activations through the same layers over and over, some otherwise benign parameterizations cause huge instability. Our key idea: we can think of the residual stream of a model as a time-varying dynamical system - the same fundamentals behind SSMs like Mamba and S4. Then a few modest modifications to classic Transformers (stable diagonalization of injection params, LN before embeddings) can stabilize the looped models. The resulting models are more stable to train, but also reach higher quality. It's strong enough to start to derive new scaling laws. Classically - we know you need to scale parameters with data to be FLOP-optimal. With Parcae, we find a third axis - given fixed parameters, you additionally want to scale FLOPs by looping as you scale data. Super excited to see how these ideas hold, and what we can do with looped models! Check out @hayden_prairie's great explainer thread below, and see links for our paper, blog, and models. Joint w/ @zacknovack and @BergKirkpatrick, and a fun collab between @togethercompute and my lab at @ucsd_cse. Enjoy!

Hayden Prairie@hayden_prairie

We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size. Our scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem! 🧵👇

English

128

21.6K

Jon Saad-Falcon retweetledi

Azalia Mirhoseini@Azaliamirh·14 Nis

Turns out we can get SOTA on agentic benchmarks with a simple test-time method! Excited to introduce LLM-as-a-Verifier. Test-time scaling is effective, but picking the "winner" among many candidates is the bottleneck. We introduce a way to extract a cleaner signal from the model: 1️⃣ Ask the LLM to rank results on a scale of 1-k 2️⃣ Use the log-probs of those rank tokens to calculate an expected score You can get a verification score in a single sampling pass per candidate pair. Blog: llm-as-a-verifier.notion.site Code: llm-as-a-verifier.github.io Led by @jackyk02 and in collaboration with a great team: @shululi256, @pranav_atreya, @liu_yuejiang, @drmapavone, @istoica05

English

114

987

115.7K

Jon Saad-Falcon retweetledi

Tarun Suresh@TarunSures41845·13 Nis

Great work with @hangoo_kang , @JonSaadFalcon , and @Azaliamirh on a new system for environment-specific LLM agent self-improvement that trains the agent on the underlying capabilities it lacks 🚀

Hangoo Kang@hangoo_kang

Introducing TRACE: an end-to-end system for environment-specific agent self-improvement🚀 Outperforms direct RL on the environment, GEPA, and synthetic data approaches on τ²-Bench and ToolSandBox📈 Collab w/ @TarunSures41845, @JonSaadFalcon, @Azaliamirh. Details in thread👇

English

2.8K

Jon Saad-Falcon retweetledi

Hangoo Kang@hangoo_kang·13 Nis

English

155

14K

Jon Saad-Falcon retweetledi

Jaya Gupta@JayaGup10·13 Nis

x.com/i/article/2043…

ZXX

248

102.7K

Jon Saad-Falcon retweetledi

Tristan Thrush@TristanThrush·10 Nis

New paper! Want to precisely optimize synthetic training data to do practical or even wacky things? Dataset Policy Gradients get you there, letting you target any differentiable training or post-training metric. We embedded a QR code in GPT-2’s weights using only training data!

English

223

46.8K

Jon Saad-Falcon retweetledi

Jacky Kwok@jackyk02·10 Nis

We release LLM-as-a-Verifier 🧠: A general-purpose verification framework that achieves SOTA 👑 on Terminal-Bench 2 (86.4%) and SWE-Bench Verified (77.8%) by scaling: - scoring granularity - repeated verification - criteria decomposition 📄 Blog & Code: llm-as-a-verifier.notion.site

English

443

54.4K

Keşfet

@Benioff @OpenJarvisAI @JeffDean @Azaliamirh @jubayer_hamid @noahdgoodman @robbymanihani @hayden_prairie