Brian Lester

93 posts

Brian Lester

@blester125

Senior Research Engineer at Google Deep Mind working on parameter-efficient adaptation and few-shot generalization, mostly within NLP. View are my own. he/him

Katılım Temmuz 2013

243 Takip Edilen449 Takipçiler

Brian Lester@blester125·5 Nis

Is Kevin onto something? We found that LLMs can struggle to understand compressed text, unless you do some specific tricks. Check out arxiv.org/abs/2404.03626 and help @hoonkp, @alemi, Jeffrey Pennington, @ada_rob, @jaschasd, @noahconst and I make Kevin’s dream a reality.

GIF

English

2.6K

Brian Lester@blester125·9 Haz

@jkobject Merging models is also handled by plug-ins! #L62" target="_blank" rel="nofollow noopener">github.com/r-three/git-th… Let me know if you want any guidance on writing the plug-in, it would be nice to have something beyond simple averaging!

English

Jérémie Kalfon@jkobject·9 Haz

@blester125 Next you should implement ideas from the pepper on weight authoring for merging multiple fine tuned models!

English

101

Brian Lester@blester125·8 Haz

Introducing Git-Theta, a Git extension that enables collaborative and continual development of ML models with merges, diffs, and parameter-efficient updates—all using the standard Git workflow! 📄 arxiv.org/abs/2306.04529 💽 github.com/r-three/git-th… 🗣️ cccml.zulipchat.com 🧵⬇️

English

407

63.1K

Brian Lester@blester125·8 Haz

We just pushed a new update adding support for the (very impressive) safetensors library from our friends at @huggingface! Git-Theta's plug-in system meant that we spent more time waiting on CI/CD than actually adding support (I'll get off my soapbox now 🧼📦).

Brian Lester@blester125

English

5.3K

Brian Lester@blester125·8 Haz

@samcoward @colinraffel What the "leaves" of a model are is controlled by the checkpoint plug-in github.com/r-three/git-th…. A new plug-in that returns layers instead of weights may do what you want (although other parts might need to be tweaked, we made some assumptions about single tensors)

English

371

Sam Coward@samcoward·8 Haz

@blester125 @colinraffel Very cool! Any plans for being able to diff changes to modules/layers?!

English

422

Brian Lester@blester125·8 Haz

This was joint work will wonderful collaborators: @kandpal_nikhil @Muqeeth10 @anisham197 @montymevans Vishal Baskaran @TenghaoHuang45 @liu_haokun and @colinraffel

English

689

Brian Lester@blester125·8 Haz

Git-Theta is designed around plug-ins—this means that if we don’t support your favorite framework, merging strategy, or parameter-efficient update yet, you can add it! Join us on GitHub github.com/r-three/git-th… or Zulip cccml.zulipchat.com to start contributing!

English

815

Brian Lester@blester125·20 Nis

.@MotiveStudio, I saw @PyTorch in the licenses for @deadspace. Are you using it as a GPU-accelerated linear algebra library or are there actually neural nets running during the game? #deadspace #deadspaceremake

English

182

Brian Lester retweetledi

Tu Vu@tuvllms·6 Ara

While parameter-efficient tuning methods are originally proposed to reduce computation & storage costs, it turns out they can help overcome catastrophic forgetting and thus improve performance on zero-shot cross-lingual generation. Checkout our work @GoogleAI @emnlpmeeting👇1/10

English

107

Brian Lester@blester125·19 Eyl

Am I missing something wrt to the name "gradient checkpointing"? Clearing cached activations and recomputing them in the backwards pass seems like the opposite of checkpointing. The name makes it sound like we are storing the activations on disk. docs.aws.amazon.com/sagemaker/late…

English

Brian Lester retweetledi

Daniel Cer@daniel_m_cer·23 May

We are presenting SPoT: Better Frozen Model Adaption through Soft Prompt Transfer @aclmeeting today during the 2pm in-person ML for NLP poster session and tomorrow at the 7:30am virtual poster session (virtual session w/@tuvuumass). #acl2022 #NLProc #ACLinDublin #acl2022nlp

English

Brian Lester@blester125·15 Nis

@LiamFedus Shouldn't GPT should be earlier in your timeline? The first GPT paper isn't on ArXiv (read timestamped) but it was cited by BERT.

English

William Fedus@LiamFedus·13 Nis

A brief 4 year LLM history: enc-only (BERT) -> enc-dec (T5) -> dec-only (GPT) As of 2022, the most compute is in decoder models -- what research supports this? Is this the best approach? Enc-dec: T5, AlphaCode, Switch, ST-MoE, RETRO Dec-only: GPT-{1,2,3}, {🐭, 🐹}, PaLM

English

212

Brian Lester retweetledi

Tu Vu@tuvllms·25 Şub

Happy to share our soft prompt transfer (SPoT) paper made it to #ACL2022 🎉. On the SuperGLUE leaderboard, SPoT is the first parameter-efficient approach that is competitive with methods that tune billions of parameters. w/ @blester125, @noahconst, @aboSamoor, @daniel_m_cer

Tu Vu@tuvllms

Sharing my internship work @GoogleAI: 1) w/ Soft Prompt Transfer, Prompt Tuning matches or significantly outperforms Model Tuning across model sizes, 2) tasks can help each other via their prompts & task prompts can be used as task embeddings to formalize task similarity. 🧵 1/8

English

Brian Lester@blester125·12 Şub

@KarimiRabeeh Stability and convergence speed are important. We’ve seen successful efforts to address this in arxiv.org/abs/2109.04332 and others. BTW, we recently released github.com/google-researc…, which may be a more stable and better tuned implementation.

English

Brian Lester@blester125·12 Şub

@KarimiRabeeh Plus, arxiv.org/abs/2110.04366 reformulates prompt-like approaches as a weighted sum of Attn(Q,K,V) and Attn(Q,Pk,Pv). |K|>>|Pk| so the overhead is minimal. This reformulation is really cool and shows prompt-like and adapter methods just differ on where they are applied.

English

Rabeeh Karimi@KarimiRabeeh·11 Şub

generally, I am not sure why NLP community is very excited on prompt-tuning methods currently, here are my arguments 1) attention scales quadratically with sequence length and prompt-tuning adds to token length 2) prompt-tuning is usually slow to converge

English

251

Keşfet

@hoonkp @alemi @ada_rob @jaschasd @noahconst @jkobject @huggingface @samcoward