Sinong Wang

15 posts

Sinong Wang

Sinong Wang

@sinongwang

Research Scientist in Meta Generative AI, working on LLM, NLP, Optimization and Recommendation system.

Bellevue, WA Katılım Ağustos 2016
45 Takip Edilen378 Takipçiler
Sinong Wang
Sinong Wang@sinongwang·
Excited to share Llama3-preview (8B/70B) that achieves best MMLU results in open source models, and also preliminary results for a 405B model. Also super excited to share that we integrate Llama3 into Meta AI, the world’s best AI assistant! ai.meta.com/blog/meta-llam…
English
0
0
1
340
Sinong Wang retweetledi
Yam Peleg
Yam Peleg@Yampeleg·
Meta just dropped a banger: LLaMA 2 Long. - Continued pretraining LLaMA on long context and studied the effects of pretraining text lengths. - Apparently having abundant long texts in the pretraing dataset is not the key to achieving strong performance. - They also perform a large experiment session comparing different length scaling techniques. - Surpassed gpt-3.5-turbo-16k’s on a multiple long-context tasks. - They also study the effect of instruction tuning with RL + SFT and all combinations between the two. The model weights are not out yet. Hopefully Soon! 🙏
Yam Peleg tweet media
Aran Komatsuzaki@arankomatsuzaki

Effective Long-Context Scaling of Foundation Models LLAMA 70B variant surpasses gpt-3.5-turbo-16k’s overall performance on a suite of long-context tasks arxiv.org/abs/2309.16039

English
13
75
531
163K
Sinong Wang
Sinong Wang@sinongwang·
Excited to share our latest latest work on long context LLM, which is the new foundation model behind 28 Meta AI agents. The new long-context LLM model also achieves the better performance than ChatGPT-3.5-turbo-16k across various tasks.
AI at Meta@AIatMeta

🆕 Effective Long-Context Scaling of Foundation Models ➡️ bit.ly/3ER36jT Another piece of research that helps us build engaging conversational experiences for our AIs and the Meta AI assistant.

English
0
0
0
394
Sinong Wang
Sinong Wang@sinongwang·
Excited to share our latest work on extending LLM context window length without fine-tuning!
AK@_akhaliq

LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models paper page: huggingface.co/papers/2308.16… In recent years, there have been remarkable advancements in the performance of Transformer-based Large Language Models (LLMs) across various domains. As these LLMs are deployed for increasingly complex tasks, they often face the needs to conduct longer reasoning processes or understanding larger contexts. In these situations, the length generalization failure of LLMs on long sequences become more prominent. Most pre-training schemes truncate training sequences to a fixed length (such as 2048 for LLaMa). LLMs often struggle to generate fluent texts, let alone carry out downstream tasks, after longer contexts, even with relative positional encoding which is designed to cope with this problem. Common solutions such as finetuning on longer corpora often involves daunting hardware and time costs and requires careful training process design. To more efficiently leverage the generation capacity of existing LLMs, we theoretically and empirically investigate the main out-of-distribution (OOD) factors contributing to this problem. Inspired by this diagnosis, we propose a simple yet effective solution for on-the-fly length generalization, LM-Infinite, which involves only a Lambda-shaped attention mask and a distance limit while requiring no parameter updates or learning. We find it applicable to a variety of LLMs using relative-position encoding methods. LM-Infinite is computational efficient with O(n) time and space, and demonstrates consistent fluency and generation quality to as long as 32k tokens on ArXiv and OpenWebText2 datasets, with 2.72x decoding speedup. On downstream task such as passkey retrieval, it continues to work on inputs much longer than training lengths where vanilla models fail immediately.

English
0
0
2
384
Sinong Wang retweetledi
Qinyuan Ye
Qinyuan Ye@qinyuan_ye·
Hi #NAACL2022! Last summer we had a crazy idea of distilling transformer models into shallow, sparse, and fast models. Curious about whether and to what extent this idea works? Please come to our presentation tomorrow! 📍 Session 1D @ Elwha A ⏰ Mon 11:30-11:45
Qinyuan Ye tweet media
English
2
19
103
0
Sinong Wang retweetledi
Karthik A Sankararaman 🇮🇳🇺🇸
We wondered what happens when aligning dropouts with the common bayesian interpretation as a posterior over the weights, for transformers. Turns out it largely reduces over-fitting; Improvements across many apples-to-apples experiments. @sinongwang @Han_Fang_ @MetaAI
AK@_akhaliq

BayesFormer: Transformer with Uncertainty Estimation abs: arxiv.org/abs/2206.00826 introduce BayesFormer, a Transformer model with dropouts designed by Bayesian theory

English
1
10
65
0
Sinong Wang
Sinong Wang@sinongwang·
Prompt tuning can be instance-dependent. Thrilled to share our new work! "IDPG: An Instance-Dependent Prompt Generation Method". Check out our paper here: arxiv.org/pdf/2204.04497…
Sinong Wang tweet media
English
1
1
2
0
Sinong Wang
Sinong Wang@sinongwang·
SOTA in NLP is typically achieved by LM pretraining followed by finetuning. Our recent paper in ACL shows that pretraining has a diminishing return as the number of training examples increases, and LSTM can be within 1 percent of BERT models. Link: arxiv.org/pdf/2006.08671…
Sinong Wang tweet media
English
4
55
243
0
Sinong Wang
Sinong Wang@sinongwang·
Thrilled to share our new work! "Linformer: Self-attention with Linear Complexity". We show that self-attention is low rank, and introduce a linear-time transformer that performs on par with traditional transformers. Check our here: arxiv.org/pdf/2006.04768…
Sinong Wang tweet mediaSinong Wang tweet media
English
7
85
341
0
Sinong Wang
Sinong Wang@sinongwang·
@icmlconf When there are slow machines in distributed sparse data computation, how can we mitigate these stragglers to reduce the final job completion time? Our work on Coded Sparse Matrix Multiplication are accepted to @icmlconf. Arxiv version: arxiv.org/pdf/1802.03430…
English
0
1
5
0