Oscar Nazhan
20.4K posts



yall amik AL untuk beratur sepanjang hari beli bag. i amik AL untuk tido kat rumah after a hectic week. we are not the same.

META JUST KILLED TOKENIZATION !!! A few hours ago they released "Byte Latent Transformer". A tokenizer free architecture that dynamically encodes Bytes into Patches and achieves better inference efficiency and robustness! (I was just talking about how we need dynamic tokenization that is learned during training 🥲 It's like fucking christmas!) I don't want to talk too much about the architecture. But here's a nice visualization from their paper. Let's look at benchmarks instead :) "BLT models can match the performance of tokenization-based models like Llama 3 at scales up to 8B and 4T bytes, and can trade minor losses in evaluation metrics for up to 50% reductions in inference flops!" This is basically a perplexity vs training flops chart - scaling laws with compute. BPB is a tokenizer independent version of perplexity. BLT is on par or better than LLama 3 BPE! Most importantly they scale this approach to train Llama-3 8B model on 1T tokens which beats the standard Llama-3 architecture with BPE tokenizer!

I love functools.partial















