cornelistools

172 posts

cornelistools banner
cornelistools

cornelistools

@cornelistools

Building advanced language technologies and robust AI systems for the Dutch language 🇳🇱 Research & development in #nlp #nlu #ai #rl #conversationalAI

Gooise Meren (NL) Katılım Ekim 2020
393 Takip Edilen48 Takipçiler
cornelistools retweetledi
Stephen Wolfram
Stephen Wolfram@stephen_wolfram·
What's really going on in machine learning? Just finished a deep dive using (new) minimal models. Seems like ML is basically about fitting together lumps of computational irreducibility ... with important potential implications for science of ML, and future tech... writings.stephenwolfram.com/2024/08/whats-…
Stephen Wolfram tweet media
English
106
661
3.8K
419.5K
cornelistools retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
# RLHF is just barely RL Reinforcement Learning from Human Feedback (RLHF) is the third (and last) major stage of training an LLM, after pretraining and supervised finetuning (SFT). My rant on RLHF is that it is just barely RL, in a way that I think is not too widely appreciated. RL is powerful. RLHF is not. Let's take a look at the example of AlphaGo. AlphaGo was trained with actual RL. The computer played games of Go and trained on rollouts that maximized the reward function (winning the game), eventually surpassing the best human players at Go. AlphaGo was not trained with RLHF. If it were, it would not have worked nearly as well. What would it look like to train AlphaGo with RLHF? Well first, you'd give human labelers two board states from Go, and ask them which one they like better: Then you'd collect say 100,000 comparisons like this, and you'd train a "Reward Model" (RM) neural network to imitate this human "vibe check" of the board state. You'd train it to agree with the human judgement on average. Once we have a Reward Model vibe check, you run RL with respect to it, learning to play the moves that lead to good vibes. Clearly, this would not have led anywhere too interesting in Go. There are two fundamental, separate reasons for this: 1. The vibes could be misleading - this is not the actual reward (winning the game). This is a crappy proxy objective. But much worse, 2. You'd find that your RL optimization goes off rails as it quickly discovers board states that are adversarial examples to the Reward Model. Remember the RM is a massive neural net with billions of parameters imitating the vibe. There are board states are "out of distribution" to its training data, which are not actually good states, yet by chance they get a very high reward from the RM. For the exact same reasons, sometimes I'm a bit surprised RLHF works for LLMs at all. The RM we train for LLMs is just a vibe check in the exact same way. It gives high scores to the kinds of assistant responses that human raters statistically seem to like. It's not the "actual" objective of correctly solving problems, it's a proxy objective of what looks good to humans. Second, you can't even run RLHF for too long because your model quickly learns to respond in ways that game the reward model. These predictions can look really weird, e.g. you'll see that your LLM Assistant starts to respond with something non-sensical like "The the the the the the" to many prompts. Which looks ridiculous to you but then you look at the RM vibe check and see that for some reason the RM thinks these look excellent. Your LLM found an adversarial example. It's out of domain w.r.t. the RM's training data, in an undefined territory. Yes you can mitigate this by repeatedly adding these specific examples into the training set, but you'll find other adversarial examples next time around. For this reason, you can't even run RLHF for too many steps of optimization. You do a few hundred/thousand steps and then you have to call it because your optimization will start to game the RM. This is not RL like AlphaGo was. And yet, RLHF is a net helpful step of building an LLM Assistant. I think there's a few subtle reasons but my favorite one to point to is that through it, the LLM Assistant benefits from the generator-discriminator gap. That is, for many problem types, it is a significantly easier task for a human labeler to select the best of few candidate answers, instead of writing the ideal answer from scratch. A good example is a prompt like "Generate a poem about paperclips" or something like that. An average human labeler will struggle to write a good poem from scratch as an SFT example, but they could select a good looking poem given a few candidates. So RLHF is a kind of way to benefit from this gap of "easiness" of human supervision. There's a few other reasons, e.g. RLHF is also helpful in mitigating hallucinations because if the RM is a strong enough model to catch the LLM making stuff up during training, it can learn to penalize this with a low reward, teaching the model an aversion to risking factual knowledge when it's not sure. But a satisfying treatment of hallucinations and their mitigations is a whole different post so I digress. All to say that RLHF *is* net useful, but it's not RL. No production-grade *actual* RL on an LLM has so far been convincingly achieved and demonstrated in an open domain, at scale. And intuitively, this is because getting actual rewards (i.e. the equivalent of win the game) is really difficult in the open-ended problem solving tasks. It's all fun and games in a closed, game-like environment like Go where the dynamics are constrained and the reward function is cheap to evaluate and impossible to game. But how do you give an objective reward for summarizing an article? Or answering a slightly ambiguous question about some pip install issue? Or telling a joke? Or re-writing some Java code to Python? Going towards this is not in principle impossible but it's also not trivial and it requires some creative thinking. But whoever convincingly cracks this problem will be able to run actual RL. The kind of RL that led to AlphaGo beating humans in Go. Except this LLM would have a real shot of beating humans in open-domain problem solving.
Andrej Karpathy tweet media
English
403
1.2K
8.8K
1.2M
Aidan Gomez
Aidan Gomez@aidangomez·
Spent a day with Cohere’s xrisk safety team keeping an eye on Command R++ training.
Aidan Gomez tweet media
English
24
6
348
61.3K
cornelistools retweetledi
Yann LeCun
Yann LeCun@ylecun·
🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next few months. llama.meta.com/llama3/
Yann LeCun tweet media
English
205
1.1K
7K
572.4K
cornelistools retweetledi
François Chollet
François Chollet@fchollet·
At least once a year I come across the argument that "scale is all you need: the more neurons a species has, the more intelligent it is; humans have over 2x more neurons than gorillas and that makes all the difference; future AIs will have even more neurons than us! If we give them 1,000x more neurons they will be 1,000x more intelligent!" As a reminder, the species with the most neurons are whales and African elephants (a whopping 3x more than us). And for any particular neuron count, you will find species with wildly different levels of cognitive ability.
English
65
106
844
165.8K
cornelistools
cornelistools@cornelistools·
@fchollet I agree. The strength of the Stack Overflow is still a community with diversity, vision and experiences of many programmers around the world. And one answer is often not an answer. In coding the why is at least important as the how. LLMs can't provide that vision yet.
English
0
0
0
28
François Chollet
François Chollet@fchollet·
I don't think StackOverflow will outright disappear. It will have to downsize by a large factor (80%?), but the need to get human-written answers to novel questions will always be there. You can't automate it away with current technology. Of course, that's a much more niche use case than what SO used to be.
English
73
36
634
130.7K
cornelistools retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
We’re releasing Gemma 2B and 7B, which achieve best-in-class performance for their sizes compared to other models, and can run on a developer laptop or computer. They also surpass much larger models on key benchmarks while meeting our standards for safe and responsible outputs.
Google DeepMind tweet media
English
70
67
342
147.2K
cornelistools
cornelistools@cornelistools·
@pkuhar @GaryMarcus Indeed, with RAG you limit the allowed "knowlegde" of the model to "talk" about only the retrieved results provided in a prompt. As the retrieval system is basically search (or "semantic" or whatever search), it has many limitations on reasoning, nor sense of the total dataset.
English
0
0
0
61
Peter Kuhar
Peter Kuhar@pkuhar·
@GaryMarcus I’m also interested in that. Anecdotally, RAG overrides any common sense the model might have. in Case of web search, the facts from search results will make the LLM blind to any knowledge or common sense it has.
English
3
0
4
705
Gary Marcus
Gary Marcus@GaryMarcus·
Is there any good data/review on how well RAG works?
English
10
4
26
15.3K
cornelistools
cornelistools@cornelistools·
@GaryMarcus I mean, if the expectation is a talking search engine, it works "ok", but mainly defined by the quality of the external retrieval system. If the expectation is reasoning about the question and results, the RAG method is useless. LLMs do nothing very smart here & even can be SLMs.
English
0
0
0
50
cornelistools
cornelistools@cornelistools·
@karpathy I tried managing my schedule with an LLM and had the same result 😉
English
0
0
0
14
cornelistools
cornelistools@cornelistools·
@GaryMarcus As you need to insert with RAG your company's knowlegde externally into the LLM, most of the preliminary work is done in the R-mechanism (retrieval) outside the LLM. Using a LLM to generate pre-presented data is actually a waste of the knowlegde it has. SLMs work fine here too.
English
1
0
1
131
Christine Liebrecht
Christine Liebrecht@christineliebr·
Zojuist ⁦@KvW⁩ gekeken over fake honing: honing die tot wel 75% wordt aangelengd met verdikte suikerwater uit China. Ik dacht: ik check even de supermarkthoning in mijn eigen keukenkast 🫠 #keuringsdienstvanwaarde
Christine Liebrecht tweet media
Nederlands
4
0
5
1.6K
cornelistools retweetledi
Cohere Labs
Cohere Labs@Cohere_Labs·
Today, we’re launching Aya, a new open-source, massively multilingual LLM & dataset to help support under-represented languages. Aya outperforms existing open-source models and covers 101 different languages – more than double covered by previous models. cohere.com/research/aya
English
55
359
1.3K
701.3K
cornelistools
cornelistools@cornelistools·
@ohmypy And for many scripting languages you don't really need a configuration language at all. Having a config .py or .php file is not very wrong. However, to separate code and configuration language completely or having a compiled language, JSON would be fine and a well-known standard.
English
0
0
0
11
Anton Zhiyanov
Anton Zhiyanov@ohmypy·
Unpopular opinion. You don't need a better configuration language. Just use the damn JSON. Yes, it is primitive and unexpressive. And that's exactly what you need for configs. Not yaml, not toml, and definitely not that Apple-invented thing. Use JSON, dammit.
English
199
62
1K
184.6K
cornelistools
cornelistools@cornelistools·
@Justin_Halford_ @fchollet If the task you want to solve is more beneficial than the energy consumption, it may not. However, knowing spending a 500ml bottle of water when I have a conversation with ChatGPT, I would like to see more earth friendly solutions. There is a lot that needs to be improved there.
English
1
0
0
141
Justin Halford
Justin Halford@Justin_Halford_·
@fchollet Does intelligence need to be power efficient? If we could cure cancer or room temperature superconductivity with a GWh scale energy investment and hundreds of thousands of GPUs, is it not still worthwhile?
English
15
1
52
8.6K
François Chollet
François Chollet@fchollet·
The human brain runs on 12 watts, about 97% less than a single H100.
English
89
160
1.5K
347K
cornelistools
cornelistools@cornelistools·
@fchollet Also: Human: "Let's eat 2 slices of bread with peanut butter for the morning" Tesla bot: "The production version of the Optimus bot will be equipped with a 2.3-kilowatt battery pack"
English
0
0
0
21
cornelistools
cornelistools@cornelistools·
@fchollet 12 watts is about 2x Raspberry Pi 4B's at full load 💪 I love to see people already experimenting with some 7B GGUF models on PIs, and I hope to see way more focus on energy efficient models.
English
1
0
0
127
cornelistools
cornelistools@cornelistools·
Mijn gevoel dat Chain-of-Thought prompting vooral werkt is omdat het deels "Chain-of-Completion" is. LLMs zijn getraind voor completion (next token prediction) en door een redenatie vraagstuk stap voor stap uit te schrijven, word redenatie een behapbaarder completion vraagstuk.
Nederlands
0
0
0
62