David Toussaint

584 posts

David Toussaint banner
David Toussaint

David Toussaint

@DavidToussaint7

Katılım Şubat 2015
149 Takip Edilen71 Takipçiler
David Toussaint retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
Wild idea in this paper 🤯 How might we store knowledge affordably yet comprehensively? Memory³ proposes an intriguing method - compressing factual data separately. Introduces a third form of memory in addition to the implicit knowledge stored in model parameters and the short-term working memory used during inference (context key-values). 👨‍🔧 LLMs struggle with inefficient knowledge storage and retrieval, leading to high training and inference costs. The paper aims to address this by introducing a more efficient memory format. 📌 Memory3 introduces explicit memory as a third memory format for LLMs, alongside model parameters (implicit memory) and context key-values (working memory). This explicit memory is implemented as sparse attention key-values, allowing for more efficient knowledge storage and retrieval. 📌 Defines a memory hierarchy for LLMs: plain text (RAG) → explicit memory → model parameters. As you move up this hierarchy, write cost increases while read cost decreases. The goal is to optimize knowledge placement across this hierarchy based on usage frequency. 📌 Memory3's architecture involves converting reference texts into explicit memories before inference. During inference, these memories are retrieved and integrated into self-attention layers. This design allows for smaller model size while maintaining performance. 📌 The explicit memory format uses intense compression to save space. It selects only the first half of attention layers as memory layers, uses grouped query attention to reduce key-value heads, and selects only 8 out of 128 tokens for each key-value head based on attention weights. 📌 The training process involves a two-stage approach: a warmup stage without explicit memory, followed by a continual train stage with explicit memory. This approach was necessary as starting with explicit memory from the beginning rendered the memories useless. 📌 Introduces a "memory circuitry theory" to formalize the concept of knowledge in LLMs. It defines knowledge as circuits (equivalence classes of subgraphs) in the computation graph, categorizing them as specific or abstract knowledge. 📌 The Memory3 model achieved better performance than larger models and RAG models on various benchmarks, while maintaining higher decoding speed. It showed particular improvements in factuality and reduced hallucination.
Rohan Paul tweet media
English
21
218
1.2K
153K
David Toussaint retweetledi
Yann LeCun
Yann LeCun@ylecun·
🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. llama.meta.com/llama3/
Yann LeCun tweet media
English
205
1.1K
7K
572.4K
David Toussaint retweetledi
MTG:Toulouse
MTG:Toulouse@mtg_toulouse·
🚀@imihalcea plonge dans le futur de l'IA avec nous! 🤖 Sera-t-il éclipsé par une IA super intelligente en tant que speaker ? 🌟 Ne ratez pas le live pour percer ce mystère! #IA #AGI 😜🔍
MTG:Toulouse tweet media
Français
0
3
5
281
David Toussaint retweetledi
Yann LeCun
Yann LeCun@ylecun·
L'IA peut-elle penser comme un philosophe. Aujourd'hui, non. En cela je suis d'accord avec @Enthoven_R. Mais y parviendra-t-elle demain? C'est très probable.
Mathieu Laine@mathieulaine

L’IA parviendra-t-elle à penser ? Pour cette lecture du brillant essai de @Enthoven_R, j’ai échangé avec les camarades @ylecun, @StanDehaene, @Doc_Alexandre et #OlivierOudé. À lire dans @LesEchos. @EdLObservatoire @AltermindGroup

Français
32
40
184
101.5K
David Toussaint retweetledi
Yann LeCun
Yann LeCun@ylecun·
* Language is low bandwidth: less than 12 bytes/second. A person can read 270 words/minutes, or 4.5 words/second, which is 12 bytes/s (assuming 2 bytes per token and 0.75 words per token). A modern LLM is typically trained with 1x10^13 two-byte tokens, which is 2x10^13 bytes. This would take about 100,000 years for a person to read (at 12 hours a day). * Vision is much higher bandwidth: about 20MB/s. Each of the two optical nerves has 1 million nerve fibers, each carrying about 10 bytes per second. A 4 year-old child has been awake a total 16,000 hours, which translates into 1x10^15 bytes. In other words: - The data bandwidth of visual perception is roughly 16 million times higher than the data bandwidth of written (or spoken) language. - In a mere 4 years, a child has seen 50 times more data than the biggest LLMs trained on all the text publicly available on the internet. This tells us three things: 1. Yes, text is redundant, and visual signals in the optical nerves are even more redundant (despite being 100x compressed versions of the photoreceptor outputs in the retina). But redundancy in data is *precisely* what we need for Self-Supervised Learning to capture the structure of the data. The more redundancy, the better for SSL. 2. Most of human knowledge (and almost all of animal knowledge) comes from our sensory experience of the physical world. Language is the icing on the cake. We need the cake to support the icing. 3. There is *absolutely no way in hell* we will ever reach human-level AI without getting machines to learn from high-bandwidth sensory inputs, such as vision. Yes, humans can get smart without vision, even pretty smart without vision and audition. But not without touch. Touch is pretty high bandwidth, too.
Parmita Mishra@parmita

This is an essential point people seem to misrepresent.

English
551
1.7K
8.5K
1.9M
David Toussaint retweetledi
MTG:Toulouse
MTG:Toulouse@mtg_toulouse·
🎉 Le prochain meetup aura lieu mardi 19 mars, et on se retrouve pour deux sessions : IA 🤖 et Monads 🥳 !Vous pouvez réserver votre soirée ✨ Détails et inscriptions à venir très vite.
Français
1
2
4
136
David Toussaint
David Toussaint@DavidToussaint7·
@JMDeruty @imihalcea Cela nous permettra de naviguer entre le respect de la précision historique et l'aspiration à une représentation plus inclusive et diversifiée, sans pour autant compromettre l'un ou l'autre.
Français
0
0
1
47
David Toussaint
David Toussaint@DavidToussaint7·
@JMDeruty @imihalcea Il est donc impératif de rester critiques envers les modèles et leur utilisation, tout en continuant à éduquer sur leurs potentiels risques et biais.
Français
1
0
2
40
David Toussaint retweetledi
Yann LeCun
Yann LeCun@ylecun·
Like @AndrewYNg, I have observed a definite shift in the prevalent discourse about AI at Davos: - Few people still talk about existential risk, and few people believe that current technology, even scaled up, will present an existential risk. - Everyone agrees that open source AI platforms are a good thing for cultural and linguistic diversity, local sovereignty, education, science, and businesses. - Everyone agrees that regulating AI-powered products can be useful in certain areas (health, transportation, etc). - The debate is still on for whether AI research and development and open source AI platforms should be regulated. - Many people are worried about a new flood of AI-powered political disinformation. Industry-wide standards for content authentication are needed. - AI has become the most talked-about topic.
Andrew Ng@AndrewYNg

My takeaways from attending WEF at Davos last week: - There were lots of discussions on business implementation of AI. My top two tips: (i) Pretty much all knowledge workers can benefit from using GenAI now, but most will need training. (ii) Task-based analysis of jobs is helping businesses identify opportunities. - Also lots of AI regulation conversations. I'm happy to report that the conversation is much more sensible than 6 months ago. For example, the unnecessary fears and discussion on AI extinction risk is fading away. But some big companies are still pushing for stifling, anti-competitive regulations, and the fight to protect open-source is still far from won. - Attending climate sessions made me even more worried about the lack of action to change our planet's trajectory. Rather than 1.5 degrees Celsius of warming as the optimistic case and 2 degrees as the pessimistic case, I think 2 degrees is an optimistic case, and 4 degrees a more realistic pessimistic case. Decarbonization remains critical; and unfortunately, that we're talking about 1.5-2 degrees rather than 2-4 degrees means we're underinvesting in resilience, adaptation, and potentially game-changing technologies like geo-engineering. Longer writeup below in The Batch: deeplearning.ai/the-batch/ai-o…

English
76
259
1.6K
574.3K