Gavin Uberti

133 posts

Gavin Uberti

Gavin Uberti

@UbertiGavin

Building model-specific AI chips @ Etched

San Jose, CA شامل ہوئے Mart 2022
243 فالونگ3.5K فالوورز
Tanishq Kumar
Tanishq Kumar@tanishqkumar07·
I've been working on a new LLM inference algorithm. It's called Speculative Speculative Decoding (SSD) and it's up to 2x faster than the strongest inference engines in the world. Collab w/ @tri_dao @avnermay. Details in thread.
English
134
457
4.1K
607.6K
Andrei Serban
Andrei Serban@andrei_serban·
All companies run on IT. But almost all IT teams are underwater. The ones that aren't run on Console. Console raised a $23M Series A led by DST Global Partners and @ThriveCapital to help leading companies like @scale_ai, @tryramp, and @webflow automate 50%+ of their tickets.
English
64
45
404
158.6K
Gavin Uberti ری ٹویٹ کیا
Brendan (can/do)
Brendan (can/do)@BrendanFoody·
Mercor (@mercor_ai) scaled from $1-500M in revenue run rate in the last 17 months, making us the fastest growing company of all time. Our growth is accelerating. We averaged 11% week over week growth in July, 18% WoW growth in August, and 19% WoW growth in September. One trend driving this meteoric growth: the Economy is Becoming an RL Environment Machine. Reinforcement learning is becoming so effective that agents can hillclimb any benchmark, but humans need to define the rewards to automate everything. While everyone fears job loss, we’re creating a new category of knowledge work faster than any other time in history. The future of work will converge on training agents. We're paying out over $1M / day to people in our marketplace and hiring experts rapidly across nearly every domain: software engineers, doctors, lawyers, consultants, bankers, and many more.
English
149
195
1.5K
618.1K
Tarun Amasa
Tarun Amasa@TarunAmasa·
It’s official. We’ve raised $14m led by @OpenAI Startup Fund to bring AI to Excel. Endex is the first AI agent to live inside Excel. For the past year, we've been working with financial firms. Today we’re releasing it to the world. Our capacity is limited; comment below for an early invite 🧵
English
2.1K
462
8.1K
10.8M
Gavin Uberti
Gavin Uberti@UbertiGavin·
There is no data wall.
Gavin Uberti tweet media
English
2
0
14
3.7K
Gavin Uberti
Gavin Uberti@UbertiGavin·
happy llama day to all those who celebrate
AI at Meta@AIatMeta

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model with 16 experts. • Industry-leading context window of 10M tokens. • Outperforms Gemma 3, Gemini 2.0 Flash-Lite and Mistral 3.1 across a broad range of widely accepted benchmarks. Llama 4 Maverick • 17B-active-parameter model with 128 experts. • Best-in-class image grounding with the ability to align user prompts with relevant visual concepts and anchor model responses to regions in the image. • Outperforms GPT-4o and Gemini 2.0 Flash across a broad range of widely accepted benchmarks. • Achieves comparable results to DeepSeek v3 on reasoning and coding — at half the active parameters. • Unparalleled performance-to-cost ratio with a chat version scoring ELO of 1417 on LMArena. These models are our best yet thanks to distillation from Llama 4 Behemoth, our most powerful model yet. Llama 4 Behemoth is still in training and is currently seeing results that outperform GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks. We’re excited to share more details about it even while it’s still in flight. Read more about the first Llama 4 models, including training and benchmarks ➡️ go.fb.me/gmjohs Download Llama 4 ➡️ go.fb.me/bwwhe9

English
1
1
15
4.6K
Joe Li
Joe Li@JoeLi5050·
Just won 1st Place and $40k at the Mercor x Etched x Cognition inference-time compute hackathon! 4 Stanford freshman (@radi_cho, @zeynebnkaya, @nicolesplaining, and I) built "LLaDA-R1: Scaling Reasoning at Inference Time with Diffusion-LLMs" in 24 hours
Joe Li tweet mediaJoe Li tweet mediaJoe Li tweet media
English
53
38
972
91.6K
Gavin Uberti
Gavin Uberti@UbertiGavin·
@magnificentgrnt The trustees behind Magnificent are some of the best mentors I've had the privilege of working with. Salar, Firat, Sidar, and of course Barend are all world-class.
English
0
0
3
445
Magnificent Grants
Magnificent Grants@magnificentgrnt·
Check out the new 2024 Cohort of Magnificent Grants. And happy to report that going forward, the fellowship application process will be on a rolling basis, launching a nomination system... substack.com/home/post/p-15…
English
2
7
35
10.5K
Gavin Uberti
Gavin Uberti@UbertiGavin·
@vllm_project Would the additional reasoning tokens allowed by the speedup make up for the loss in accuracy (assuming we are in a time-bound environment)? Come test it at the Inference Time Compute Hackathon!
English
0
0
0
1K
Gavin Uberti
Gavin Uberti@UbertiGavin·
@vllm_project Rather than insist the speculative decoding distribution match exactly, if you are OK with, say, 99% of the distribution being recovered (e.g. according to some metric like KL Divergence), you could accept guessed tokens more frequently
English
1
0
0
1.2K
Gavin Uberti
Gavin Uberti@UbertiGavin·
Hackathon idea - nearly speculative decoding. As of v0.7.3, @vllm_project supports Deepseek R1's Multi-Token Prediction module, letting you "skip" a token generation if the multi-token prediction guessed it correctly in advance. But what if you accepted almost correct guesses?
Etched@Etched

We're excited to partner with @Cognition_Labs @Mercor_AI @CoreWeave and @AnthropicAI to host an inference-time compute hackathon, featuring >$60K in cash prizes and >1 exaflop of free compute.

English
2
0
4
2.8K
Gavin Uberti
Gavin Uberti@UbertiGavin·
You could add more experts by just increasing K_r and M. Would this make Deepseek smarter? Would you need to scale down the outputs accordingly? Would you need to fine-tune a bit, or would it work out of the box? What if you used all experts simultaneously?
English
1
0
0
102
Gavin Uberti
Gavin Uberti@UbertiGavin·
Using more FLOPs should make transformers smarter. DeepSeek R1 currently uses ~8 routed experts per token. So would selecting more (and possibly scaling by the router) improve performance? Come test it out at the Inference Time Compute Hackathon and win up to $60k in prizes!
Etched@Etched

We're excited to partner with @Cognition_Labs @Mercor_AI @CoreWeave and @AnthropicAI to host an inference-time compute hackathon, featuring >$60K in cash prizes and >1 exaflop of free compute.

English
1
2
17
2.3K
Justin Uberti
Justin Uberti@juberti·
During the development of WebRTC, we recognized the impact of voice and video on human communication, and I wondered if someday we'd talk to AIs the same way. Today, we can see this future taking shape, and I'm excited to announce I've joined @OpenAI to lead real-time AI efforts!
Justin Uberti tweet media
English
77
63
1.8K
216.6K
Tanishq Kumar
Tanishq Kumar@tanishqkumar07·
[1/5] Made a Twitter account just to share some fun research projects I worked on over the summer! The first is a computational neuroscience project asking: do mice grok? TLDR; we revisit recent neural data from mouse cortex as they are overtrained on a binary odor discrimination task, finding that they continue to learn generalizing solutions -- their odor representations of the two classes continue to separate -- even as behavior stops changing on a training set. Joint work with @blake__bordelon, @CPehlevan, Venki Murthy at @MCB_Harvard and @gershbrain!
Tanishq Kumar tweet media
English
2
4
20
3.5K