Nick Woolsey

594 posts

Nick Woolsey banner
Nick Woolsey

Nick Woolsey

@NickWoolsey

I'm a globe-trotting, poi dancing, multi-disciplinary artist investigating signs of intelligent life on earth.

Planet Earth Katılım Haziran 2011
74 Takip Edilen604 Takipçiler
Sabine Hossenfelder
Sabine Hossenfelder@skdh·
Everyone who thinks the world could just stop using fossil fuels on the snap of a finger should have a look at this chart. More than 80% of the world's energy supply presently comes from oil, gas, and coal, and that number has barely changed in the past decade. Of course we will eventually phase out fossil fuels, simply because the supply is finite. No matter how hard you dig, there's only so much of the stuff. But at present, the life of pretty much everyone on this planet depends in one way or another on fossil fuels. In case you live in a fancy new "zero emissions" house, well, first of all congrats on being in the 0.001% of the world population who can afford that, and second, try to figure out how many of the supply chains for building that house would break down without fossil fuels. If we were to put a price on carbon dioxide emissions tomorrow without also subsidizing fossil fuels, much of the world economy would collapse because most of the key industries would go bankrupt basically overnight. (I think we should still put a price on carbon because it's the right default, but then we'll need to find a way to ease the transition.) This is why it's become so hard to solve this problem. It would have been easy enough 50 years ago to put a price on carbon dioxide, switch to nuclear, and with further improvements in solar to more of that. But we've missed that bus. I want to emphasize again because people keep misunderstanding this, I am not a fan of fossil fuels. If it were up to me, I'd plaster the world with nuclear power plants tomorrow and would take great pleasure in seeing oil companies falter and die. I am merely saying this is a difficult problem to solve, and the reason it's difficult is not technological, it's mostly economical. That said, let me stress again that I think the extensions of the electric grid necessary to support the transition to renewables are an underappreciated problem. Without the grid, nothing else is going to work.
Sabine Hossenfelder tweet media
English
592
1.1K
5.2K
1.2M
Nick Woolsey
Nick Woolsey@NickWoolsey·
@honeytubs @JannePHirvonen @colwight @Aron_Adler @skdh @SpiralSquirrel It’s very regional, with big margins of error. The models show that BC summers will likely get hotter and drier, which could mean most of our forests burn. In general, coastal will experience less extremes. I’m sure there are maps of regions that will (likely) be most livable
English
1
0
0
20
Nick Woolsey
Nick Woolsey@NickWoolsey·
@ethanwbrown @DrJimFan Depends on whether you define “breakthrough” in terms of intellectual novelty or impact. Many breakthrough technologies were assemblages of existing technologies that crossed a threshold of usefulness.
English
0
0
2
16
Jim Fan
Jim Fan@DrJimFan·
In my decade spent on AI, I've never seen an algorithm that so many people fantasize about. Just from a name, no paper, no stats, no product. So let's reverse engineer the Q* fantasy. VERY LONG READ: To understand the powerful marriage between Search and Learning, we need to go back to 2016 and revisit AlphaGo, a glorious moment in the AI history. It's got 4 key ingredients: 1. Policy NN (Learning): responsible for selecting good moves. It estimates the probability of each move leading to a win. 2. Value NN (Learning): evaluates the board and predicts the winner from any given legal position in Go. 3. MCTS (Search): stands for "Monte Carlo Tree Search". It simulates many possible sequences of moves from the current position using the policy NN, and then aggregates the results of these simulations to decide on the most promising move. This is the "slow thinking" component that contrasts with the fast token sampling of LLMs. 4. A groundtruth signal to drive the whole system. In Go, it's as simple as the binary label "who wins", which is decided by an established set of game rules. You can think of it as a source of energy that *sustains* the learning progress. How do the components above work together? AlphaGo does self-play, i.e. playing against its own older checkpoints. As self-play continues, both Policy NN and Value NN are improved iteratively: as the policy gets better at selecting moves, the value NN obtains better data to learn from, and in turn it provides better feedback to the policy. A stronger policy also helps MCTS explore better strategies. That completes an ingenious "perpetual motion machine". In this way, AlphaGo was able to bootstrap its own capabilities and beat the human world champion, Lee Sedol, 4-1 in 2016. An AI can never become super-human just by imitating human data alone. ----- Now let's talk about Q*. What are the corresponding 4 components? 1. Policy NN: this will be OAI's most powerful internal GPT, responsible for actually implementing the thought traces that solve a math problem. 2. Value NN: another GPT that scores how likely each intermediate reasoning step is correct. OAI published a paper in May 2023 called "Let's Verify Step by Step", coauthored by big names like @ilyasut @johnschulman2 @janleike: arxiv.org/abs/2305.20050 It's much lesser known than DALL-E or Whipser, but gives us quite a lot of hints. This paper proposes "Process-supervised Reward Models", or PRMs, that gives feedback for each step in the chain-of-thought. In contrast, "Outcome-supervised reward models", or ORMs, only judge the entire output at the end. ORMs are the original reward model formulation for RLHF, but it's too coarse-grained to properly judge the sub-parts of a long response. In other words, ORMs are not great for credit assignment. In RL literature, we call ORMs "sparse reward" (only given once at the end), and PRMs "dense reward" that smoothly shapes the LLM to our desired behavior. 3. Search: unlike AlphaGo's discrete states and actions, LLMs operate on a much more sophisticated space of "all reasonable strings". So we need new search procedures. Expanding on Chain of Thought (CoT), the research community has developed a few nonlinear CoTs: - Tree of Thought: literally combining CoT and tree search: arxiv.org/abs/2305.10601 @ShunyuYao12 - Graph of Thought: yeah you guessed it already. Turn the tree into a graph and Voilà! You get an even more sophisticated search operator: arxiv.org/abs/2308.09687 4. Groundtruth signal: a few possibilities: (a) Each math problem comes with a known answer. OAI may have collected a huge corpus from existing math exams or competitions. (b) The ORM itself can be used as a groundtruth signal, but then it could be exploited and "loses energy" to sustain learning. (c) A formal verification system, such as Lean Theorem Prover, can turn math into a coding problem and provide compiler feedbacks: lean-lang.org And just like AlphaGo, the Policy LLM and Value LLM can improve each other iteratively, as well as learn from human expert annotations whenever available. A better Policy LLM will help the Tree of Thought Search explore better strategies, which in turn collect better data for the next round. @demishassabis said a while back that DeepMind Gemini will use "AlphaGo-style algorithms" to boost reasoning. Even if Q* is not what we think, Google will certainly catch up with their own. If I can think of the above, they surely can. Note that what I described is just about reasoning. Nothing says Q* will be more creative in writing poetry, telling jokes @grok, or role playing. Improving creativity is a fundamentally human thing, so I believe natural data will still outperform synthetic ones. I welcome any thoughts or feedback!!
Jim Fan tweet media
English
148
644
3.2K
1.8M
Dylan Cope
Dylan Cope@DylanRobertCope·
@GaryMarcus Hardly conclusive, but this was my first attempt
Dylan Cope tweet media
English
2
0
3
702
Gary Marcus
Gary Marcus@GaryMarcus·
Please start your DALL-E engines, and report back. Is DALL-E really this heteronormative? A reader writes, “picture of two men in love getting married, it will usually put a wife next to each of them that they are in love with.”
English
12
4
16
14.2K
Nick Woolsey
Nick Woolsey@NickWoolsey·
@alexandr_wang An introvert who has learned to lean into gregarious social behaviour is not an extrovert, it’s an exhausted internet! I agree with the bit about unstoppable though ;)
English
0
0
0
30
Alexandr Wang
Alexandr Wang@alexandr_wang·
one of the biggest lies of psychology is that introversion vs extroversion are innate traits extroversion can be learned—given a sufficient number of positive social interactions, you can become an extrovert and introverts who become extroverts are generally unstoppable ;)
English
183
144
1.7K
360.1K
Nick Woolsey
Nick Woolsey@NickWoolsey·
@Abel_TorresM @NPCollapse @OpenAI @janleike They are very clear that their vision and goal is AGI. Whether, how, and how long it will take... ah now I think I get your original question: not “who is setting out to build AGI?” but “who is actually in the process?” If so, agreed, maybe nobody yet.
English
1
0
1
30
Connor Leahy
Connor Leahy@NPCollapse·
Although shocking at first glance, this is unsurprising to me - normal people know that building AI much more powerful than humans could spell disaster. Even @OpenAI’s alignment head @JanLeike thinks there’s a 10-90% chance we all die! So why don’t we just stop building AGI?
Andrea Miotti@andreamiotti

Interesting poll results by @DanielColson6’s new AI Policy Institute. 82% of US voters don’t trust tech executives to self-regulate on AI, 72% would support slowing down development. Looks like it’s time for governments to step in and ban AGI development in the private sector?

English
57
26
162
130.6K
Nick Woolsey
Nick Woolsey@NickWoolsey·
@ShannonKoz @JohnPasalis Give free money to...? Many of the immigrants are wealthy, it's a huge inflow of money to Canada's economy. But I may be misunderstanding what you're referring to
English
0
0
0
112
John Pasalis
John Pasalis@JohnPasalis·
“Housing isn’t a primary federal responsibility” Trudeau In short, it’s their responsibility to triple 🇨🇦 population growth rate to supercharge the demand (and cost) for housing But they couldn’t care less where people live because that’s not their “primary responsibility” 1/
English
508
771
2.7K
1M
Nick Woolsey
Nick Woolsey@NickWoolsey·
@emollick I agree, capability change captures how it has changed for both better and worse, depending on the goals and prompts in question.
English
0
0
0
158
Ethan Mollick
Ethan Mollick@emollick·
After working a lot GPT-4, including retesting our old prompts, I would agree with this post 👇 At least for our uses, GPT-4 has not gotten less capable, but there have been subtle changes to how it responds to prompts in recent months that may be mistaken for it getting worse.
Arvind Narayanan@random_walker

We dug into a paper that’s been misinterpreted as saying GPT-4 has gotten worse. The paper shows behavior change, not capability decrease. And there's a problem with the evaluation—on 1 task, we think the authors mistook mimicry for reasoning. w/ @sayashk aisnakeoil.com/p/is-gpt-4-get…

English
8
35
170
66.4K
Nick Woolsey
Nick Woolsey@NickWoolsey·
@emollick Never mind, I googled. The demo improves the first few sentences after that input, but then the sentences degrade again
English
0
0
0
12
Nick Woolsey
Nick Woolsey@NickWoolsey·
@emollick What happens if you reply to Llama 2: "Review your reply and re-write it to make sure that all sentences end with 'apple'"?
English
1
0
0
38
Ethan Mollick
Ethan Mollick@emollick·
Out of the box, Llama 2 beats Bard at the Insane Memo Test: Write a corporate memo in a serious style explaining and justifying the following points: -The floor is now lava -Promotion will be by staring contests -We have merged with a hive of bees. The Queen is your new CTO.
Ethan Mollick tweet mediaEthan Mollick tweet mediaEthan Mollick tweet mediaEthan Mollick tweet media
English
14
80
508
116.6K
Nick Woolsey
Nick Woolsey@NickWoolsey·
@svpino Is there a study that runs the same testing through GPT-4 with Code Interpreter? It gave me a good answer for "Can you help me determine whether 48,841 is a prime number? Please demonstrate step by step reasoning so I can understand the process."
English
0
0
1
115
Santiago
Santiago@svpino·
Yes, GPT-4 seems to be getting worse. But now we have new information. And well, it's complicated. Yesterday, I posted about a study showing that GPT-4 success rate deciding whether a number is prime went from 97.6% in March to 2.4% in June. The report also showed how the model ignored requests to follow step-by-step reasoning, and it was less likely to generate code that ran without modifications. Hundreds of people replied with their anecdotes. The overwhelming consensus is that GPT-4 is considerably less capable than before. But the study that started the conversation is misleading. They used a dataset of 500 problems and had the model figure out whether a given number was prime. The latest GPT-4 version did much worse than the one from a few months ago, with only 12 correct answers out of 500. But there was an issue: Every one of the 500 integers used in the study was a prime number! They never tested composite numbers. So what happens when you make the same comparison with composite and prime numbers? It turns out that March's GPT-4 is as bad as the June version! In March, GPT-4 answered that most numbers were prime, while the June version answered that most were composite. Since the team behind the study only tested prime numbers, they concluded that GPT-4 is now much worse at determining primality, but that's not the case. Okay, so where do we stand? Funny enough, the apparent conclusion is that GPT-4 sucks at finding whether a number is prime. It didn't get worse; it was never good at it. There's still, however, a large unanswered issue related to the inability of developers to trust these models. We still don't know why the sudden change in behavior between March and June since OpenAI has firmly denied they have changed the model. What's next? OpenAI acknowledged the behavior change, and they are investigating. I hope they publish an explanation behind the drift. I'm also looking forward to a proper versioning system that developers can trust and rely on. This finding doesn't change the overall sentiment from people who overwhelmingly believe the model has worsened. Could this be confirmation bias? Could the honeymoon phase with Large Language Models be over, and people start finding the real problems when building actual applications? What do you think it's going on here?
Santiago tweet media
English
110
207
991
515.1K
Nick Woolsey
Nick Woolsey@NickWoolsey·
@emollick There were alternative paths to feeding everybody. I’m not knocking the Haber-Bosch process per se, just pointing out that it wasn’t a simple A/B choice between industrial nitrogen or 2.7B people starving
English
0
0
0
182
Ethan Mollick
Ethan Mollick@emollick·
You are, in large part, an industrial product The nitrogen for half of the protein in your body was made using the century-old Haber-Bosch process, which saved 2.7B lives. (On the other hand, it also uses 5% of the world's natural gas & the inventors might also be war criminals)
Ethan Mollick tweet media
English
12
63
403
63.7K
Edgar McGregor
Edgar McGregor@edgarrmcgregor·
There seems to be an ongoing surge in doomerism within the climate-conscious world, and I don't like it. It's reactionary, clout-seeking group driven by a quest for influence. The climate crisis is a crisis, but our civilization is not going to collapse in 18 months.
English
221
35
375
126.9K
Nick Woolsey
Nick Woolsey@NickWoolsey·
@emollick I was indeed hoping to keep following you once you departed from this platform... but are you sure that's where you want to migrate to?
English
1
0
1
539
Ethan Mollick
Ethan Mollick@emollick·
Any tools to easily move a follow list to The Other Site? I hate searching for each person individually I am ethan_mollick - just in case anyone wants to find me over there.
English
10
3
88
26.9K