Roee Hendel

12 posts

Roee Hendel

@RoeeHendel

Algorithm Developer @AI21Labs

Katılım Kasım 2012

195 Takip Edilen33 Takipçiler

Roee Hendel retweetledi

AI21 Labs@AI21Labs·11 Şub

1/5 As part of our work on improving the efficiency of our LLM online-RL training pipelines, we cut policy update step time by ~70% by introducing a model-agnostic padding minimization method. 🧵

English

655

Roee Hendel retweetledi

AI21 Labs@AI21Labs·8 Oca

1/4 🚀Introducing Jamba2, a memory-efficient open source model family built for total enterprise reliability and steerability.

English

3.2K

Roee Hendel retweetledi

AI21 Labs@AI21Labs·8 Eki

1/5 Releasing Jamba Reasoning 3B under Apache 2.0: Hybrid SSM-Transformer architecture that tops accuracy & speed across record context lengths. e.g. 3-5X faster than Llama 3.2 3B and Qwen3 4B at 32K tokens.

English

206

409.9K

Roee Hendel retweetledi

AI21 Labs@AI21Labs·22 Ağu

We released the #Jamba 1.5 open model family: - 256K #contextwindow - Up to 2.5X faster on #longcontext in its size class - Native support for structured JSON output, function calling, digesting doc objects & generating citations twtr.to/giIEE #AI #LLM #AI21Jamba

English

106

416

165K

Roee Hendel retweetledi

AI21 Labs@AI21Labs·28 Mar

Introducing Jamba, our groundbreaking SSM-Transformer open model! As the first production-grade model based on Mamba architecture, Jamba achieves an unprecedented 3X throughput and fits 140K context on a single GPU. 🥂Meet Jamba ai21.com/jamba 🔨Build on @huggingface

English

243

1.1K

332.6K

Roee Hendel retweetledi

AK@_akhaliq·28 Mar

AI21 Labs presents Jamba SSM-Transformer open model production-grade model based on Mamba architecture, Jamba achieves an unprecedented 3X throughput and fits 140K context on a single GPU.

English

107

702

86.3K

Roee Hendel@RoeeHendel·28 Eki

@yalishandi @_akhaliq We haven't explored this in depth, but it's a promising direction. Studies linking ICL to SGD could offer clues. In related tests, ICL sometimes acts like an empirical risk minimizer, fitting random examples, while other times ignoring them. Further exploration is needed.

English

Roee Hendel retweetledi

AK@_akhaliq·25 Eki

In-Context Learning Creates Task Vectors paper page: huggingface.co/papers/2310.15… In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set S to find a best-fitting function f(x) in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query x and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing S into a single task vector theta(S) and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

English

111

564

101.5K

Roee Hendel@RoeeHendel·25 Eki

@_akhaliq 🙏 @_akhaliq

QME

595

Roee Hendel@RoeeHendel·29 Mar

@LouisKirschAI Great work! Did you release the code?

English

Louis Kirsch@LouisKirschAI·9 Ara

Emergent in-context learning with Transformers is exciting! But what is necessary to make neural nets implement general-purpose in-context learning? 2^14 tasks, a large model + memory, and initial memorization to aid generalization. Full paper arxiv.org/abs/2212.04458 🧵👇(1/9)

English

389

Roee Hendel@RoeeHendel·29 Oca

@kushal_tirumala Is it possible that the different baseline values simply arise from the fact that larger models have an overall better language modeling capability, rather than memorization? It would be interesting to check the memorization value of the "special batch" prior to training on it.

English

Roee Hendel@RoeeHendel·4 Ara

@elonmusk @Carnage4Life There you go, ChatGPT as Hitler. MtH<=4 days

English

Elon Musk@elonmusk·3 Ara

@Carnage4Life The safety of any AI system can be measured by its MtH (meantime to Hitler). Microsoft’s Tay chatbot of several years ago got there in ~24 hours.

English

513

713

9.1K

Keşfet

@huggingface @yalishandi @_akhaliq @LouisKirschAI @kushal_tirumala @elonmusk @Carnage4Life @BarackObama