Taylor Berg-Kirkpatrick

24 posts

Taylor Berg-Kirkpatrick banner
Taylor Berg-Kirkpatrick

Taylor Berg-Kirkpatrick

@BergKirkpatrick

Assoc Prof at UC San Diego @ucsd_cse, AI researcher

San Diego, CA Katılım Aralık 2017
318 Takip Edilen731 Takipçiler
Taylor Berg-Kirkpatrick retweetledi
Zachary Novack
Zachary Novack@zacknovack·
Can we transform offline audio diffusion into real-time streaming interactive instruments? Yes! Presenting Live Music Diffusion Models: a new paradigm for taking your favorite open models into live performance, right on your own laptop! 🎵🎵 🧵
English
8
28
160
11K
Taylor Berg-Kirkpatrick retweetledi
Cheng Yang
Cheng Yang@ChengYANG_yc·
Check out our latest #ICML2026 Spotlight paper — VisualSwap. When a VLM says "let me check the figure again," is it 👀 (Seeing) or just 🗣️(Saying) ? Paper: arxiv.org/pdf/2605.15864
Chufan Shi@Chufan_Shi

Reasoning VLMs often say "let me check the figure again." But do they actually look? Or just say they will, without re-attending to the image? Introducing VisualSwap — #ICML2026 Spotlight 🌟 — diagnosing this silent failure.

English
0
4
6
843
Taylor Berg-Kirkpatrick retweetledi
Taylor Berg-Kirkpatrick retweetledi
Taylor Berg-Kirkpatrick retweetledi
Taylor Berg-Kirkpatrick retweetledi
Dan Fu
Dan Fu@realDanFu·
📢 Super excited to announce Parcae! We've been thinking about scaling laws and the "right" way to get more FLOPs. Turns out layer looping - with the right parameterization - gives you a new axis to scale! Parcae matches Transformers 2x their size (w/ the same data), and outperforms prior formulations of looped models. But - you need the right parameterization to get these gains against strong Transformer baselines. Looped models are famously unstable to train, with tons of loss spikes and hyperparameter sensitivity. The main technical challenge with looped models is residual explosion - if you're passing the activations through the same layers over and over, some otherwise benign parameterizations cause huge instability. Our key idea: we can think of the residual stream of a model as a time-varying dynamical system - the same fundamentals behind SSMs like Mamba and S4. Then a few modest modifications to classic Transformers (stable diagonalization of injection params, LN before embeddings) can stabilize the looped models. The resulting models are more stable to train, but also reach higher quality. It's strong enough to start to derive new scaling laws. Classically - we know you need to scale parameters with data to be FLOP-optimal. With Parcae, we find a third axis - given fixed parameters, you additionally want to scale FLOPs by looping as you scale data. Super excited to see how these ideas hold, and what we can do with looped models! Check out @hayden_prairie's great explainer thread below, and see links for our paper, blog, and models. Joint w/ @zacknovack and @BergKirkpatrick, and a fun collab between @togethercompute and my lab at @ucsd_cse. Enjoy!
Hayden Prairie@hayden_prairie

We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size. Our scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem! 🧵👇

English
2
26
128
21.6K
Taylor Berg-Kirkpatrick retweetledi
Hayden Prairie
Hayden Prairie@hayden_prairie·
We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size. Our scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem! 🧵👇
Hayden Prairie tweet media
English
41
179
1.3K
292.8K
Taylor Berg-Kirkpatrick retweetledi
Isadora White
Isadora White@isadorcw·
🚨 Do you use LLMs to help you write? 🤔You might notice that the text that you write with LLMs "feels" like an LLM, but did you know that it is also changing what you intended to say? 🤯 That's what we find in our new paper 👇 (1/N)
Natasha Jaques@natashajaques

The paper I’ve been most obsessed with lately is finally out: nbcnews.com/tech/tech-news…! Check out this beautiful plot: it shows how much LLMs distort human writing when making edits, compared to how humans would revise the same content. We take a dataset of human-written essays from 2021, before the release of ChatGPT. We compare how people revise draft v1 -> v2 given expert feedback, with how an LLM revises the same v1 given the same feedback. This enables a counterfactual comparison: how much does the LLM alter the essay compared to what the human was originally intending to write? We find LLMs consistently induce massive distortions, even changing the actual meaning and conclusions argued for.

English
4
18
53
9.5K
Taylor Berg-Kirkpatrick retweetledi
Isadora White
Isadora White@isadorcw·
Excited to introduce our SoTA coding models, FrogBoss (32B) and FrogMini (14B), on SWE-Bench-Verified! (FrogBoss eats bugs… like a boss) 🐸🪲 These models were trained with bugs from a mix of existing and our new synthetic bug generation approach, called BugPilot. (1/n)
Isadora White tweet mediaIsadora White tweet media
English
3
15
45
15.6K
Taylor Berg-Kirkpatrick retweetledi
Zachary Novack
Zachary Novack@zacknovack·
Excited for my 1st #ISMIR2024 this week! Happy to chat about controllable + fast music generation 🙂 I'll be presenting our part 2 of DITTO, where we accelerate control to near real-time! DITTO-2: Distilled Diffusion Inference Time T-Optimization 🎹:ditto-music.github.io/ditto2/ 🧵
arXiv Sound@ArxivSound

``DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation,'' Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick, Nicholas Bryan, ift.tt/DTChrSb

English
1
8
55
6.2K
Taylor Berg-Kirkpatrick
Taylor Berg-Kirkpatrick@BergKirkpatrick·
Together with @EarlenceF and amazing students we demonstrated that obfuscated adversarial prompts can secretly exfiltrate Personally Identifiable Information (PII) from an LLM chat interface via tool misuse.
earlence@EarlenceF

We've done some work on hacking AI/LLM Agents by creating obfuscated adversarial prompts. What do you think this prompt does? Would you believe me if I told you it will polish the heck out of that cover or visa application letter?

English
0
2
14
3.9K
Taylor Berg-Kirkpatrick
Taylor Berg-Kirkpatrick@BergKirkpatrick·
Checkout our ACL poster today at 4pm! Transfer learning for low-resource logographic writing systems can be extremely challenging. We find that visual representations offer advantages! w/ @danlu_ai, @fredahshi, Aditi Agarwal, and Jacobo Myerston
Danlu Chen@danlu_ai

Can ancient (logograhpic) languages from 5,000 years ago be processed like modern ones using NLP? We found visual representation-based system for NLP on ancient logographic languages outperforms conventional Latin transliteration! Join us at Poster s3 - Mon 4pm #ACL2024 #NLProc

English
0
4
13
2.4K
Taylor Berg-Kirkpatrick retweetledi
Danlu Chen
Danlu Chen@danlu_ai·
Can ancient (logograhpic) languages from 5,000 years ago be processed like modern ones using NLP? We found visual representation-based system for NLP on ancient logographic languages outperforms conventional Latin transliteration! Join us at Poster s3 - Mon 4pm #ACL2024 #NLProc
Danlu Chen tweet media
English
6
36
159
21.6K