Albert Gu

566 posts

Albert Gu

@_albertgu

assistant prof @mldcmu. chief scientist @cartesia_ai. leading the ssm revolution.

Katılım Aralık 2018

77 Takip Edilen21.2K Takipçiler

Albert Gu retweetledi

Karan Goel@krandiash·3d

.@_albertgu and I are organizing a research happy hour in London on Monday July 13th. Come hang out! luma.com/3p1u9pgf

English

4.3K

Albert Gu@_albertgu·8 Tem

Cartesia is hosting an ICML party tomorrow (Thursday) night! it'll go late, but come early or i may have to bounce you again luma.com/gy1b4ryq

English

139

33.4K

Albert Gu retweetledi

Victor Li@victor_ljz·6 Tem

We will be at ICML in Seoul to present dnaHNet in Oral + Poster sessions! Swing by and say hi if you wanna learn about the current SOTA in DNA foundation models :) Oral session: Hall D2 at 10 AM, Jul 7 Poster session: Hall A #900 at 2 PM, Jul 7

Arc Institute@arcinstitute

Most genomic AI models use fixed rules to process DNA into chunks, imposing arbitrary boundaries on a sequence with its own biological structure. @arnavshah0, @victor_ljz, and team developed dnaHNet, a tokenizer-free foundation model that learns its own segmentation from scratch, supervised by @_albertgu, @genophoria, and @BoWang87.

English

20.1K

Albert Gu@_albertgu·27 Haz

@giffmana in retrospect i realized this post sounds hilariously biased which was not intentional, i was mostly quoting the original 😂

English

1.3K

Lucas Beyer (bl16)@giffmana·27 Haz

@_albertgu 😭

QME

Albert Gu@_albertgu·27 Haz

Transformers are better at copying, while RNNs are better at modeling "meaning-bearing words—the nouns, verbs, & adjectives that say what a sentence is about"

Ai2@allen_ai

Hybrid (transformer–RNN) models are fast becoming a serious alternative to the transformer, but a big question remains: how do they process tokens differently & how does this impact performance? We compared our transformer (Olmo 3) & hybrid (Olmo Hybrid) models to find out. 🧵

English

425

58.3K

Albert Gu retweetledi

Engram@EngramLab·23 Haz

x.com/i/article/2069…

ZXX

181

242

1.9K

1.8M

Albert Gu@_albertgu·19 Haz

Rather than interleaving layers naively, a more fine-grained approach to hybrid models is to allow hybridization across the sequence models within a single layer. The fact that softmax attention and linear attention use similar underlying projection parameters allows switching between different mixers in a single generation, for the best of both worlds.

Kevin Li@kevinyli_

Excited to share last summer's work at Google Research! Most hybrid models today are static: each token sees the same interleaved pattern of your favorite linear model and attention. Oryx instead varies the model used across the sequence through shared representations. 1/

English

230

32.3K

Albert Gu@_albertgu·18 Haz

Congrats to Henry and Naomi - they’ve been so on top of the space and super helpful as collaborators too!

Henry Yin✈️ICML@HenryYin_

Most AI investing happens downstream of the frontier: a capability emerges, a category gets named, and capital rushes in. But by the time a category earns a clean box on a market map, the best builders have usually been living in the messy version for months. Agents. Reasoning. RL environments. World models. AI for Science. Recursive self-improvement. I call this frontier proximity: the ability to see what is becoming possible before it becomes consensus. My frontier proximity ladder: L0 Wrapper: uses today’s models. L1 Reactor: reacts fast to releases, but roadmap is downstream. L2 Anticipator: builds for where capabilities are going. L3 Native: depends on a non-obvious frontier bet. L4 Shaper: helps move the frontier itself. The point is not that every company needs to train models. Apps can have high frontier proximity if they understand what models will make possible next. Infra can have high frontier proximity if it knows what future agents, multimodal systems, robotics stacks, or scientific workflows will need. That is why we’re launching MoE Capital. MoE stands for Mixture of Experts. The idea is simple: build an AI fund around people closest to the frontier: frontier researchers, technical founders, AI-native builders, and seasoned operators. We don’t want to be another AI fund with a newsletter-level understanding of the frontier. We want to build the AI fund closest to the frontier. More in The Information: theinformation.com/newsletters/ai…

English

9.3K

Albert Gu@_albertgu·15 Haz

@AmosaurusRex @cartesia excited to have you here, you'll be part of the SOTA releases soon!

English

1.5K

Ido Amos@AmosaurusRex·15 Haz

I’ve only been at @cartesia for just over a month, but seeing all the talented people here up close, it’s really not surprising they’re shipping state-of-the-art text-to-speech and speech-to-text models like Sonic 3.5 and Ink 2. Huge congrats to the team, excited to be here :)

Karan Goel@krandiash

We released Sonic-3.5 and Ink-2, the #1 streaming models for text to speech and speech to text you can use in your voice agents today. New architectures enable new frontiers for speed and quality. We're now the only provider to have #1 models for both speaking and listening.

English

3.3K

Albert Gu@_albertgu·15 Haz

@animeshbohara animesh is the unsung hero

English

479

Animesh Bohara@animeshbohara·15 Haz

Some teams train good models. Some train fast ones. We don't think you should have to choose. Today at @cartesia we shipped two: Sonic 3.5 (speaking) + Ink 2 (listening). Both SOTA, both realtime. 90ms TTFA. 3.6% WER, #1 on AA. Try them out at play.cartesia.ai

English

2.3K

Albert Gu@_albertgu·15 Haz

@lulu32125 SOTA

Español

1.3K

Lucy Liu@lulu32125·15 Haz

Sonic 3.5 brings expressive, natural speech. Ink 2 brings contextual endpointing that makes conversations flow naturally. No more choosing between quality and speed.  No more stitching together multiple providers. Just a complete voice stack.

GIF

English

2.5K

Albert Gu retweetledi

Cartesia@cartesia·15 Haz

Two new models just dropped 👀 Sonic-3.5 and Ink-2 are the #1 streaming models for text to speech and speech to text

Karan Goel@krandiash

English

111

16.8K

Albert Gu@_albertgu·15 Haz

@DanielePaliotta SOTA

Español

447

Daniele Paliotta@DanielePaliotta·15 Haz

Need TTS? We are SOTA! Need ASR? We are SOTA too ❤️ Our models are now the best at speaking, and listening. Try Cartesia Sonic 3.5 and Ink 2: cartesia.ai/launch

GIF

English

1.4K

Albert Gu@_albertgu·15 Haz

Within the span of a week, we launched streaming TTS (text-to-speech) and STT (speech-to-text) models that topped the leaderboards. I'm incredibly proud of the research team for their relentless pursuit of improvement, which have unlocked new state-of-the-art audio models on the Pareto frontier of speed and quality. As a research problem, speech requires fusing both text and audio and is the gateway to general multimodal models. We built Sonic-3.5 and Ink-2 from the ground up, developing multiple innovations along the way in a direction that will scale to general real-time intelligence. I've personally been deeply involved in building these models and more; it's been a blast working with the incredibly talented research team here @cartesia, and I can't wait to show the world what's coming next :)

Karan Goel@krandiash

English

181

21.1K

Albert Gu retweetledi

Jatin Prakash@bicycleman15·12 Haz

probably a bit late to the party, but i will be interning at @cartesia over the summer -- working on model architectures with @nimit_sohoni and @_albertgu super excited! :))

English

2.2K

Albert Gu retweetledi

Eli@elipughresearch·28 May

🧵 on some fun insider details on ink-2 😼

Cartesia@cartesia

Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.

English

Albert Gu@_albertgu·28 May

Our new model Ink-2 tops AA's leaderboard for streaming speech-to-text! Ink-2 comes with plenty of features optimized for real-time voice agents. With top-class models for both TTS and STT, the team at @cartesia keeps pushing the frontier of models for interactive intelligence.

Cartesia@cartesia

English

107

15.8K

Albert Gu retweetledi

Ronak Malde@rronak_·27 May

Today, @MichaelElabd, @QuantumArjun, and I are excited to announce Trajectory. We are a research lab and product company building the platform for Continual Learning. Our platform unlocks the signal already sitting in product usage, so companies can continuously post-train large-scale agentic models that outperform the frontier. @trajectorylabs We’ve raised $15M from @Conviction, @BessemerVP, @radicalvcfund, @jeffdean, @drfeifei and more. We’re partnering with some of the best AI-native companies: @ClayRunHQ @Harvey, @DecagonAI, @mercor_ai, @RogoAI to power their agentic systems, some of which we are already in production with. We’ve brought together a world class research team from DeepMind, OpenAI, Apple, Meta Superintelligence, Amazon AGI, Scale AI, and an elite product team from Stripe and Figma. AI will never again start on day one. Every correction, every retry, every edit will make products smarter. This is Continual Learning.

English

246

141

1.4K

1.8M

Albert Gu@_albertgu·23 May

@NousResearch I don’t understand the second plot. Why are the first 50k steps different from the first plot? What is the loss function there

English

141

Nous Research@NousResearch·22 May

Subword boundaries are the second meaningful effect. Adding end-of-subword markers as input embeddings produces a large gain throughout training (H3): end-boundaries leak future bytes (whitespace always follows an end-boundary, for example) and simplify the next-byte prediction task. Start-of-subword boundaries cannot leak the future, and they also help. When start-boundaries are provided only during the first 50k training steps and removed thereafter for both training and validation, the improvement persists; end-boundaries do not survive the same intervention. One reading is that start-boundaries supply a morphological inductive bias (H4), while end-boundaries supply a near-term prior the model becomes dependent on.

English

13.3K

Nous Research@NousResearch·22 May

Today we release a study on decoupling the benefits of subword tokenization for language model training, by simulating each suspected benefit one at a time inside a 1.7B byte-level pretraining pipeline. We formulate seven hypotheses for why subword LLMs outperform byte-level LLMs (covering computational efficiency, structural priors over subword boundaries and positions, and the optimization objective) and implement each as a controlled intervention against a byte-level baseline. Three of the seven move the validation loss at this scale; the rest either have negligible effect or hurt. Validated at 1.7B parameters on fineweb-edu with a LLaMA-3 architecture, with 68M-parameter replications in the appendix. The work was led by Théo Gigant, Bowen Peng, and Jeffrey Quesnelle. Paper: arxiv.org/abs/2604.27263

English

108

971

69.3K

Keşfet

@giffmana @AmosaurusRex @cartesia @animeshbohara @lulu32125 @DanielePaliotta @nimit_sohoni @elonmusk