

Jon Saad-Falcon
389 posts

@JonSaadFalcon
CS PhD @hazyresearch @stanfordnlp @StanfordAILab



“I think we are getting brainwashed.” @Benioff said this on @theallinpod. “We’re using $300M of @AnthropicAI this year… the vast majority of those tokens don’t need to go to Anthropic.” Some tasks need @claudeai . Some need @OpenAI . Most need smaller, cheaper, faster models like @ZeroGPU_AI @Benioff believes in what we do - @salesforcevc should take a look. zerogpu.ai

"Real men have fabs" - AMD's founder, and an industry legend Jerry Sanders

imagine if apple basically let you set up a home “server” that ran inference on that device with sophisticated models & every apple ecosystem device is a node off of that central server. it’s complicated but if anyone can deploy hardware like this it would be them. this would allow zero marginal cost ai without any middle man with a super privacy first approach. it would create another product category entirely. anyone remember the airport extreme?







OpenAI is reportedly working with $QCOM on an AI-first smartphone targeted for 2028 mass production. The idea is to compete with the $AAPL iPhone by replacing app grids with agent-driven task flows by using on-device AI for context and cloud AI for heavier work.

OpenAI: "There’s not going to be enough compute in the world to meet the demand."



Say hi to @OpenJarvisAI 👋 If you have issues, want to make a PR, or simply chat, just @OpenJarvisAI in a tweet! This account is itself an OpenJarvis instance: running 24/7 on an NVIDIA DGX Spark, triaging issues + PRs on the repo and serving as a personal assistant for the lab! For personal AI on personal devices, checkout: github.com/open-jarvis/Op… x.com/JonSaadFalcon/…

Intelligence per picojoule, with @itsclivetime and @dylan522p (0:00) Intro (1:22) What is codesign? (2:49) Codesign example: Swish vs ReLU (4:22) Are DeepSeek papers codesign? (6:45) Predicting where ML research will go (8:06) Should researchers hate your chips? (9:34) Can you codesign too much? (13:23) Picking the right grain size for specialization (16:22) How much hardware flexibility for The Age of Research? (20:05) Did reasoning and RL disrupt hardware roadmaps? (23:09) Cerebras/Groq: unexpected wins on reasoning and RL (25:34) Disaggregating MLP and attention (29:06) The right metrics for quantization and codesign papers


We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size. Our scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem! 🧵👇



Introducing TRACE: an end-to-end system for environment-specific agent self-improvement🚀 Outperforms direct RL on the environment, GEPA, and synthetic data approaches on τ²-Bench and ToolSandBox📈 Collab w/ @TarunSures41845, @JonSaadFalcon, @Azaliamirh. Details in thread👇



