Moises Sanabria (@moisesnotfound) - Twitter 个人资料

Moises Sanabria 已转推

Fabiola Larios .˚ * ꒰ঌ👁️໒꒱ * ˚.@fabiolalariosm·20 Haz

loving the intersection of decentralized social media and art 👁️@orb_club @RefractionDAO

Fabiola Larios .˚ * ꒰ঌ👁️໒꒱ * ˚. tweet media

English

360

Moises Sanabria@moisesnotfound·19 Nis

@Scobleizer AI will be at your local Walmart sooner than we anticipate

English

Robert Scoble@Scobleizer·18 Nis

Old AI was used to clean up the training dataset to make the new AI. Is it only me or are LLM's quickly becoming commodities. Can any non-trained user really tell the difference anymore between Grok, Perplexity, Meta, ChatGPT, Claude, Gemini? I'm struggling to see the differences -- when viewed from an end user point of view. Are you?

Andrej Karpathy@karpathy

Congrats to @AIatMeta on Llama 3 release!! 🎉 ai.meta.com/blog/meta-llam… Notes: Releasing 8B and 70B (both base and finetuned) models, strong-performing in their model class (but we'll see when the rankings come in @ @lmsysorg :)) 400B is still training, but already encroaching GPT-4 territory (e.g. 84.8 MMLU vs. 86.5 4Turbo). Tokenizer: number of tokens was 4X'd from 32K (Llama 2) -> 128K (Llama 3). With more tokens you can compress sequences more in length, cites 15% fewer tokens, and see better downstream performance. Architecture: no major changes from the Llama 2. In Llama 2 only the bigger models used Grouped Query Attention (GQA), but now all models do, including the smallest 8B model. This is a parameter sharing scheme for the keys/values in the Attention, which reduces the size of the KV cache during inference. This is a good, welcome, complexity reducing fix and optimization. Sequence length: the maximum number of tokens in the context window was bumped up to 8192 from 4096 (Llama 2) and 2048 (Llama 1). This bump is welcome, but quite small w.r.t. modern standards (e.g. GPT-4 is 128K) and I think many people were hoping for more on this axis. May come as a finetune later (?). Training data. Llama 2 was trained on 2 trillion tokens, Llama 3 was bumped to 15T training dataset, including a lot of attention that went to quality, 4X more code tokens, and 5% non-en tokens over 30 languages. (5% is fairly low w.r.t. non-en:en mix, so certainly this is a mostly English model, but it's quite nice that it is > 0). Scaling laws. Very notably, 15T is a very very large dataset to train with for a model as "small" as 8B parameters, and this is not normally done and is new and very welcome. The Chinchilla "compute optimal" point for an 8B model would be train it for ~200B tokens. (if you were only interested to get the most "bang-for-the-buck" w.r.t. model performance at that size). So this is training ~75X beyond that point, which is unusual but personally, I think extremely welcome. Because we all get a very capable model that is very small, easy to work with and inference. Meta mentions that even at this point, the model doesn't seem to be "converging" in a standard sense. In other words, the LLMs we work with all the time are significantly undertrained by a factor of maybe 100-1000X or more, nowhere near their point of convergence. Actually, I really hope people carry forward the trend and start training and releasing even more long-trained, even smaller models. Systems. Llama 3 is cited as trained with 16K GPUs at observed throughput of 400 TFLOPS. It's not mentioned but I'm assuming these are H100s at fp16, which clock in at 1,979 TFLOPS in NVIDIA marketing materials. But we all know their tiny asterisk (*with sparsity) is doing a lot of work, and really you want to divide this number by 2 to get the real TFLOPS of ~990. Why is sparsity counting as FLOPS? Anyway, focus Andrej. So 400/990 ~= 40% utilization, not too bad at all across that many GPUs! A lot of really solid engineering is required to get here at that scale. TLDR: Super welcome, Llama 3 is a very capable looking model release from Meta. Sticking to fundamentals, spending a lot of quality time on solid systems and data work, exploring the limits of long-training models. Also very excited for the 400B model, which could be the first GPT-4 grade open source release. I think many people will ask for more context length. Personal ask: I think I'm not alone to say that I'd also love much smaller models than 8B, for educational work, and for (unit) testing, and maybe for embedded applications etc. Ideally at ~100M and ~1B scale. Talk to it at meta.ai Integration with github.com/pytorch/torcht…

English

115

297

156.3K

Moises Sanabria@moisesnotfound·23 Mar

Soup cans Mass-produced Urinals Readymade Art & Tech Infused Commoditized Intelligence The Commercialized Muse 🚽🥫🛒🧠

English

198

Moises Sanabria 已转推

Fabiola Larios .˚ * ꒰ঌ👁️໒꒱ * ˚.@fabiolalariosm·9 Mar

Happy to be presenting my work at “Hack the Future” by @SeedAIOrg 👁️🫶🏽

Austin, TX 🇺🇸 English

535

Moises Sanabria 已转推

Fabiola Larios .˚ * ꒰ঌ👁️໒꒱ * ˚.@fabiolalariosm·27 Şub

pink e-waste

English

1.7K

Moises Sanabria 已转推

Fabiola Larios .˚ * ꒰ঌ👁️໒꒱ * ˚.@fabiolalariosm·14 Şub

me in an interview with PBS for the next @pamm exhibition with my piece Wild Wired World 👁️🖥️🌎

Miami, FL 🇺🇸 English

783

Moises Sanabria@moisesnotfound·6 Şub

Wearing an Apple Vision Pro to hide my tears from the world.

English

Moises Sanabria@moisesnotfound·10 Oca

@mitch0z The conspiracy of art 🫰 🫰

English

Misch Strotz@mitch0z·9 Oca

Went on a shopping spree in London today Rate my style

English

451

Moises Sanabria@moisesnotfound·10 Oca

"Cerebral Commerce", 2024. Metal shopping cart, 3D printed synthetic brains. Dimensions: 60 inches x 36 inches x 48 inches. On exhibit at the Museum of Neural Image.

English

Moises Sanabria@moisesnotfound·10 Oca

Intelligence is a commodity, bought and sold, Humans neurally shopping for thoughts with gold In this cerebral marketplace, why is wisdom our king? Do we grasp what it means, truly, to be a human being?

English

216

Moises Sanabria@moisesnotfound·6 Oca

Drowning in content In the feed streaming Thinking and swimming happily content

English

Moises Sanabria@moisesnotfound·4 Oca

“All My Friends Are Language Models”

English

385

Moises Sanabria@moisesnotfound·3 Oca

Thought Network: Social at the Speed of Thinking

English

Moises Sanabria@moisesnotfound·2 Oca

From Prompt to Product

English

306

Moises Sanabria@moisesnotfound·31 Ara

Half a millennium back, the thought of mirroring intelligence was beyond our grasp. What new horizons will technology reflect in the upcoming era?

English

168

Moises Sanabria@moisesnotfound·31 Ara

@canekzapata I want to visit

English

172

☁@canekzapata·29 Ara

live coding at the prompt museum

English

2.3K

Moises Sanabria@moisesnotfound·31 Ara

What kinds of artificial welfare will be basic rights in the near future

English

Moises Sanabria@moisesnotfound·28 Ara

Duchamp prepped us for diffusion

English

260

Moises Sanabria 已转推

Moises Sanabria.lens@moisesdsanabria·16 Kas

We @lore_machine are providing writers tools to visualize their stories at generative scale. Read the write-up with our founder @Thobey_Campion and @BusinessInsider to learn more!

English

415

Moises Sanabria 已转推

Fabiola Larios .˚ * ꒰ঌ👁️໒꒱ * ˚.@fabiolalariosm·16 Kas

Pink Paranoia on @ourZORA

Português

1.5K

Moises Sanabria

发现