Hugo Touvron (@HugoTouvron) - Twitter Profili | Zamantika Mersobahis Locabet

Hugo Touvron retweetledi

AI at Meta@AIatMeta·18 Nis

Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3 models — in the coming months we expect to introduce new capabilities, longer context windows, additional model sizes and enhanced performance + the Llama 3 research paper for the community to learn from our work. More details ➡️ go.fb.me/i2y41n Download Llama 3 ➡️ go.fb.me/ct2xko

English

340

1.4K

5.7K

1.1M

Hugo Touvron retweetledi

Ahmad Al-Dahle@Ahmad_Al_Dahle·18 Nis

It’s here! Meet Llama 3, our latest generation of models that is setting a new standard for state-of-the art performance and efficiency for openly available LLMs. Key highlights • 8B and 70B parameter openly available pre-trained and fine-tuned models. • Trained on more than 15T tokens, 7x+ larger than Llama 2's dataset! • Improved tokenizer with vocabulary of 128K tokens for better performance. • State-of-the-art performance across industry benchmarks. • New capabilities, including enhanced reasoning and coding. • 3x more efficient training than Llama 2. • New trust and safety tools with Llama Guard 2, Code Shield, and CyberSec Eval 2. • Integrated into Meta AI, and available in more countries across our apps. • And, just the beginning with more models and new capabilities coming soon! Visit the Llama 3 website to read more and download the models. llama.meta.com/llama3

English

62

194

961

328.2K

Hugo Touvron retweetledi

Pedro Cuenca@pcuenq·21 Tem

One thing I love about open access LLMs is that you can play with the system prompt as you wish – no need for hacks. So we released 2 additional Llama 2 demos that allow you to change all parameters, including the prompt: 7B: hf.co/spaces/hugging… 13B: hf.co/spaces/hugging…

English

2

17

78

18.8K

Hugo Touvron retweetledi

Boz@boztank·18 Tem

Llama 2 is open source and available free today for developers, researchers, and entrepreneurs. We’re excited to partner with Azure, AWS, Hugging Face and more to deliver this to all of you. ai.meta.com/llama

Yann LeCun@ylecun

This is huge: Llama-v2 is open source, with a license that authorizes commercial use! This is going to change the landscape of the LLM market. Llama-v2 is available on Microsoft Azure and will be available on AWS, Hugging Face and other providers Pretrained and fine-tuned models are available with 7B, 13B and 70B parameters. Llama-2 website: ai.meta.com/llama/ Llama-2 paper: ai.meta.com/research/publi… A number of personalities from industry and academia have endorsed our open source approach: about.fb.com/news/2023/07/l…

English

11

21

138

26.1K

Hugo Touvron retweetledi

Andrej Karpathy@karpathy·18 Tem

Huge day indeed for AI and LLMs, congrats to Meta 👏 This is now the most capable LLM available directly as weights to anyone from researchers to companies. The models look quite strong, e.g. Table 4 in the paper: MMLU is good to look at, the 70B model is just below GPT-3.5. But HumanEval (bad misnomer) shows coding capability is quite a bit lower (48.1 vs 29.9).

Yann LeCun@ylecun

This is huge: Llama-v2 is open source, with a license that authorizes commercial use! This is going to change the landscape of the LLM market. Llama-v2 is available on Microsoft Azure and will be available on AWS, Hugging Face and other providers Pretrained and fine-tuned models are available with 7B, 13B and 70B parameters. Llama-2 website: ai.meta.com/llama/ Llama-2 paper: ai.meta.com/research/publi… A number of personalities from industry and academia have endorsed our open source approach: about.fb.com/news/2023/07/l…

English

61

496

3.8K

1M

Hugo Touvron retweetledi

Soumith Chintala@soumithchintala·18 Tem

LLaMa-2 from @MetaAI is here! Open weights, free for research and commercial use. Pre-trained on 2T tokens. Fine-tuned too (unlike v1). 🔥🔥🔥 Lets gooo.... ai.meta.com/llama/ The paper lists the amazing authors who worked to make this happen night and day. Be sure to thank them for their tireless pursuit of open science and true democratization! ai.meta.com/research/publi…

English

26

176

1.1K

182.3K

Hugo Touvron retweetledi

Yann LeCun@ylecun·18 Tem

This is huge: Llama-v2 is open source, with a license that authorizes commercial use! This is going to change the landscape of the LLM market. Llama-v2 is available on Microsoft Azure and will be available on AWS, Hugging Face and other providers Pretrained and fine-tuned models are available with 7B, 13B and 70B parameters. Llama-2 website: ai.meta.com/llama/ Llama-2 paper: ai.meta.com/research/publi… A number of personalities from industry and academia have endorsed our open source approach: about.fb.com/news/2023/07/l…

English

387

3.4K

15K

4.3M

Hugo Touvron retweetledi

AK@_akhaliq·18 Tem

Meta releases Llama 2: Open Foundation and Fine-Tuned Chat Models paper: ai.meta.com/research/publi… blog: ai.meta.com/llama/ develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closedsource models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

English

32

548

2.1K

637.4K

Hugo Touvron retweetledi

Lucas Beyer (bl16)@giffmana·26 Haz

If you lived under a rock: the MMLU score in the LLaMa paper was claimed irreproducible. However, simply using the original eval code perfectly reproduces it. The following conclusions in the blog post is wrong, imo it should be "only use original eval code, or mark with *".

Thomas Wolf@Thom_Wolf

What was going on with the Open LLM Leaderboard? Its numbers didn't match the ones reported in the LLaMA paper! We've decided to dive in this rabbit hole with friends from the LLaMA & Falcon teams and got back with a blog post of learnings & surprises: huggingface.co/blog/evaluatin…

English

7

14

148

120.4K

Hugo Touvron retweetledi

Yao Fu@Francis_YAO_·17 Haz

It seems that Hani @itanih0 has solved the puzzle and the reason why LLaMA has a lower number on Open LLM Leaderboard is due to a tokenization bug (devil's in the detail Great work! Also AFAIK HuggingFace @natolambert @Thom_Wolf is doing an Elo leaderboard with very carefully calibrated GPT4 eval. Also very meaningful! Love to see open source pushes the area this way

Hani Itani@itanih0

The @huggingface #OpenLLMLeaderboard has attracted a lot of interest lately, but did you know that it puts LLaMA based models at a disadvantage? In our evaluations using a recent commit of @AIEleuther's lm-evaluation-harness, LLaMA based models improve by 4-5 points on average!

English

3

10

79

22.6K

Hugo Touvron retweetledi

Yao Fu@Francis_YAO_·9 Haz

Guys, I know you want watch toe-to-toe battles. Here you go: Under official MMLU prompts, default huggingface generate() function, fp16, no fancy prompt engineering, no more complication: LLaMA v.s Falcon = 63.64 v.s 49.08 Happy? Disappointed? Good? Bad? Win? Lose? code + detailed results at github.com/FranxYao/chain… ---- The longer version: Previously we reported LLaMA 64B had 61.4 on MMLU with basically default settings, which is close to the original paper. Later we realized there was a bug incurred by long prompts, resulting LLaMA getting 0 scores on high school european history and high school us history. Fixing this, the LLaMA score becomes 63.64, basically the same number as the one reported in the paper. Using using the identical script, Falcon 40B gets 49.08. Feel free to check our code and run it on your machine -- which requires 4 * 80G GPT mem though 😅 We are happy to correct if you spot any bug or unusual things Again, nothing really fancy or challenging. Official MMLU prompts and default HF generation function. And again, both are awesome models, nontrivial efforts, and greatly advanced the field!

English

18

17

148

66.1K

Hugo Touvron retweetledi

Yao Fu@Francis_YAO_·8 Haz

Is Falcon really better than LLaMA? Short take: probably not. Longer take: we reproduced LLaMA 65B eval on MMLU and we got 61.4, close to the official number (63.4), much higher than its Open LLM Leaderboard number (48.8), and clearly higher than Falcon (52.7). Code and prompt open-sourced at github.com/FranxYao/chain… No fancy prompting engineering, no fancy decoding, everything by default. ---- Full story: On OpenLLM Leaderboard (huggingface.co/spaces/Hugging…), Falcon is the top 1, suppressing LLaMA, and promoted by @Thom_Wolf (x.com/thom_wolf/stat…) Yet later @karpathy expressed concern about why on Open LLM Leaderboard, the LLaMA 65B score is significantly lower than official (48.8 v.s. 63.4), see x.com/karpathy/statu… We figure that a simple quick open-sourced evaluation script on LLaMA 65B would clarify, so we just did it github.com/FranxYao/chain… Again, everything is default, official MMLU prompt, no fancy prompt engineering, no fancy decoding. LLaMA 65B simply can do it. We encourage everyone to try the eval script out. This result makes us continue to hold the belief that the best bet of open-source community to get close to GPT-3.5 is to do RLHF on LLaMA 65B, per our previous discovery in Chain-of-thought Hub arxiv.org/abs/2305.17306 Yet we do not intend to raise wars between LLaMA and Falcon -- both are great open-sourced models and have made significant contribution to the field! Falcon also have the advantage of a easier license, which also gives its great potential to be awesome! 🍻🍻

English

32

121

694

335.4K

Hugo Touvron retweetledi

Gautier Izacard@gizacard·24 Şub

Happy to release a collection of LLaMA 🦙, large language models ranging from 7B to 65B parameters and trained on publicly available datasets. LLaMA-65B is competitive with Chinchilla and PaLM. Paper: tinyurl.com/ycxr2mvj

Guillaume Lample @ NeurIPS 2024@GuillaumeLample

Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. The weights for all models are open and available at research.facebook.com/publications/l… 1/n

English

3

16

119

20.7K

Hugo Touvron retweetledi

Guillaume Lample @ NeurIPS 2024@GuillaumeLample·24 Şub

Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. The weights for all models are open and available at research.facebook.com/publications/l… 1/n

Guillaume Lample @ NeurIPS 2024 tweet media

English

151

1.4K

6.5K

3.2M

Hugo Touvron retweetledi

Andrew Ng@AndrewYNg·30 Eyl

I’d like to address the serious matter of some newcomers to AI experiencing imposter syndrome, where someone wonders if they’re a fraud or really belong in the AI community. Lets build a community that encourages and welcomes everyone. deeplearning.ai/the-batch/issu…

English

31

117

782

0

Hugo Touvron retweetledi

MLIA@mlia_isir·29 Eyl

Congratulations to @HugoTouvron who brightly defended his PhD thesis today! 👏👏👏 Thank you for the very interesting presentation of your work! Good luck for the future!

English

3

4

30

0

Hugo Touvron retweetledi

MLIA@mlia_isir·27 Eyl

Phd Defense annoucement📢 @HugoTouvron will defend his thesis in 2 days! September 29th at 2 p.m. Title: "Architectures and Training for Visual Understanding" CIFRE thesis in collab with @MetaAI, supervised by @quobbe and @hjegou Youtube link: youtu.be/S4r7UIJHAKI

YouTube

English

0

3

21

0

Hugo Touvron@HugoTouvron·28 Tem

@wightmanr The models are online. Sorry for the lag, it took a little longer than expected as I didn't have my work laptop with me the last few days.

English

1

0

3

0

Ross Wightman@wightmanr·23 Tem

@HugoTouvron Nice. Comparing to some of the more complex multi-scale or hybrid vit models, it hits comparable top-1 with 50-80% faster model inference & train throughputs. If checkpoints are up in next few days, it'll make the next timm 0.6.x PyPi bug-fix update.

English

1

0

3

0

Ross Wightman@wightmanr·11 Tem

timm 0.6.5 now on PyPi🥳 First test set validation passes (using #PyTorch 1.12) are looking good. The DeiT-iii huge/large models are up there with the BEiT, EfficientNet-L2. Hopefully full validation and timing results will be available later in the week. github.com/rwightman/pyto…

English

1

23

199

0

Hugo Touvron@HugoTouvron·22 Tem

@wightmanr The ViT-m trained with DeiT III recipes reaches 83.0% at resolution 224 with ImageNet-1k training and 84.6% with ImageNet-21k pre-training. We will upload the models in the deit repo asap.

English

1

0

4

0

Ross Wightman@wightmanr·14 Tem

@HugoTouvron Great. Ping me if they work out and I'll add. I have some support for parallel already, but impl a bit diff than deit version, think I should be able to adapt cleanly though

English

1

0

2

0

Hugo Touvron@HugoTouvron·12 Tem

@wightmanr Thanks @wightmanr for the suggestion. We will add a ViT-medium (patch_size=16, embed_dim=512, depth=12, num_heads=8) in our experiments of DeiT-III. We are also preparing a parallel ViT-Hx2. Let us know if you are interested in another model that we may train with our recipe.

English

1

0

3

0

Ross Wightman@wightmanr·11 Tem

It's quite a bit easier to train well on small data than base, but better perf than small. @HugoTouvron maybe a DeiT-III 'medium' run would be worthwhile :)

English

2

0

3

0

Hugo Touvron

Keşfet