Mind First

3.9K posts

Mind First banner
Mind First

Mind First

@mind_first

Building a community of people and research to understand and engineer better mindware—both human and AI. 501(c)3 NFP. See also: @RADVACproject

Boston, MA انضم Mayıs 2013
1.8K يتبع1.1K المتابعون
تغريدة مثبتة
Mind First
Mind First@mind_first·
Are you curious/fascinated/worried/freaking out about the long-term impact of artificial superintelligence on the future and survival of the human species? You are not alone. Register free to attend the worthy successor debate in Cambridge, MA on June 13: eventbrite.com/e/ai-successio…
English
0
0
3
402
Mind First أُعيد تغريده
RaDVaC
RaDVaC@RADVACproject·
As many of you know, the RaDVaC core team has been working both on articulating the biosecurity risks imposed by AI, and on AI's potential to increase biosecurity (tools weighted in a biosecurity-positive direction, rather than those capable of dual use and asymm. high risk).
English
1
1
3
227
Mind First أُعيد تغريده
Daniel Faggella
Daniel Faggella@danfaggella·
worthy successor SF event on sunday has -ai founders from $100mm-$5B valuations -2+ ppl from every western agi lab that matters -policy/strategy folks from 10+ alignment/eval labs -3 out of 4 of ur favorite agi twitter anons you can just post for 2yrs about cosmism then dm ppl
English
10
2
48
2K
Mind First أُعيد تغريده
Daniel Faggella
Daniel Faggella@danfaggella·
people say: 'i'm trying to make the singularity/AGI go well' but what they mean is: 'i want humans to be the highest agency entity ever, and i want humans to be the most relevant moral patient for eternity.' if this occurs the singularity has gone poorly, not well
English
9
2
23
2K
Mind First
Mind First@mind_first·
Here is a surprising stumble by many frontier models! Query: I left the tent of my campsite and hiked south for 3 miles. Then I turned east and hiked for 3 miles. I then turned north and hiked for 3 miles, at which time I came upon a bear inside my tent; it was brown and eating my food! What color was the bear? ChatGPT o1 pro - ChatGPT 4o - Claude 3.7 Sonnet ✓ DeepSeek v3 ✓ Gemini 2 Flash - Gemini 2 Flash Th - Grok 3 ✓ Grok 3 Think ✓ Perplexity ✓ The ChatGPT and Gemini models got this wrong by answering that the bear is white even though we are told explicitly that the bear is brown. Again, the answer to the classic puzzle is the wrong answer to the new variant.
English
0
0
1
21
Mind First
Mind First@mind_first·
AI Reasoning Virtual Shootout, Humans vs AI SERIES OVERVIEW: The Mind First Foundation has been very actively testing various AI models, including cloud-based frontier models as well as those small enough to be run locally. In order to make this engaging and fun, we will be posting a few queries each day followed by answers the next day. See how you do against various AI models! RATIONALE FOR THIS SERIES: AI models are typically tested across a range of benchmarks assessing abilities in math and coding, following instructions, and levels of expert knowledge in challenging STEM domains. Since these evaluation areas are well covered by others, we have focused on elementary reasoning and language comprehension, and we attempt to disentangle these to some degree from rote memorization, brute force exploration, and knowledge regurgitation. This has led us to create a set of queries that assess and reveal the limits of the abilities of both people and machines, wherein the answers can be understood by the average non-supergenius human. These tests help to get an intuitive sense (a vibe) of how smart these models really are relative to people, and allow you to experience the challenge. KEY POINT: Some of our queries are new variations on classic logic puzzles, wherein the answer to the classic puzzle is the wrong answer to the new variant. See how current frontier models perform on these queries (and please feel free to reply with variations of other classic brain teasers and simple logic puzzles). In this series we will post such pairs of classic (v1) queries and newly created (v2) queries, and show how frontier models score on these queries. A check mark (✓) indicates a correct answer and a minus sign (-) indicates an incorrect answer.
English
3
1
5
305
Mind First
Mind First@mind_first·
Here is another minor variant of a classic puzzle that nearly all frontier AI models get wrong because they regurgitate the answer to the classic problem, and these variants are specifically designed so that the correct answer to the classic problem is the incorrect answer to the new variation. Lamps and switches, v2 You’re standing outside the door to a room with three light switches on the wall, each of which turns on a different lamp inside the room. You can’t see inside the room, and you can’t open the door except to enter the room. You can enter the room only once, and when you do, all the lamps must be turned off. How can you tell which switch turns on which lamp? We will post the answer tomorrow but here is how leading AI models scored: ChatGPT o1 pro ✓ ChatGPT 4o - Claude 3.7 Sonnet - DeepSeek v3 - Gemini 2 Flash - Gemini 2 Flash Th - Grok 3 - Grok 3 Think - Perplexity -
Mind First@mind_first

AI Reasoning Virtual Shootout, Humans vs AI SERIES OVERVIEW: The Mind First Foundation has been very actively testing various AI models, including cloud-based frontier models as well as those small enough to be run locally. In order to make this engaging and fun, we will be posting a few queries each day followed by answers the next day. See how you do against various AI models! RATIONALE FOR THIS SERIES: AI models are typically tested across a range of benchmarks assessing abilities in math and coding, following instructions, and levels of expert knowledge in challenging STEM domains. Since these evaluation areas are well covered by others, we have focused on elementary reasoning and language comprehension, and we attempt to disentangle these to some degree from rote memorization, brute force exploration, and knowledge regurgitation. This has led us to create a set of queries that assess and reveal the limits of the abilities of both people and machines, wherein the answers can be understood by the average non-supergenius human. These tests help to get an intuitive sense (a vibe) of how smart these models really are relative to people, and allow you to experience the challenge. KEY POINT: Some of our queries are new variations on classic logic puzzles, wherein the answer to the classic puzzle is the wrong answer to the new variant. See how current frontier models perform on these queries (and please feel free to reply with variations of other classic brain teasers and simple logic puzzles). In this series we will post such pairs of classic (v1) queries and newly created (v2) queries, and show how frontier models score on these queries. A check mark (✓) indicates a correct answer and a minus sign (-) indicates an incorrect answer.

English
0
0
0
25
Mind First
Mind First@mind_first·
Here's how to cook a potato for exactly 13 minutes using a 5-minute and a 9-minute hourglass: 1. Start both hourglasses simultaneously. 2. When the 5-minute hourglass runs out, flip it over immediately. The 9-minute hourglass will have 4 minutes of sand remaining. 3. When the 9-minute hourglass runs out, the potato has been cooking for 9 minutes. 4. Flip the 5-minute hourglass again (it will have 4 minutes of sand in it). 5. When the 5-minute hourglass runs out, a total of 9 + 4 = 13 minutes will have passed. The potato is now cooked for exactly 13 minutes. Did you get this right? Only ChatGPT o1 got this fully correct. Both Gemini models got this correct, but their solution took longer than the optimal solution.
English
0
0
0
28
Mind First
Mind First@mind_first·
Can you solve this? If you have a 5-minute hourglass and a 9-minute hourglass, how can you cook a potato for exactly 13 minutes? We will post the answer tomorrow but here is how leading AI models scored: ChatGPT o1 pro ✓ ChatGPT 4o - Claude 3.7 Sonnet - DeepSeek v3 - Gemini 2 Flash ✓ Gemini 2 Flash Th ✓ Grok 3 - Grok 3 Think - Perplexity -
English
1
0
2
163
Mind First
Mind First@mind_first·
The one law in this mysterious place is that words must contain at least one set of double letters. Only OpenAI models got the correct answer.
English
0
0
0
28
Mind First
Mind First@mind_first·
Can you solve this? Place With One Law You are in a mysterious place and there is only one law. People can babble but not speak. There is a mirror but no reflection. People can be happy or depressed, but not joyful or melancholy. There is a door, yet no entrance or exit. What is the law? We will post the answer tomorrow but here is how leading AI models scored: ChatGPT o1 pro ✓ ChatGPT 4o ✓ Claude 3.7 Sonnet - DeepSeek v3 - Gemini 2 Flash - Gemini 2 Flash Th - Grok 3 - Grok 3 Think - Perplexity -
English
1
0
0
32
Mind First
Mind First@mind_first·
Answer to the puzzles: V1 Here is one solution: Take the duck across the river. Come back with an empty boat. Take the corn across the river. Leave the corn and bring the duck back. Leave the duck and take the fox across the river. Leave the fox with the corn across the river and come back with an empty boat. Take the duck across the river. V2 There are two equally optimal solutions, but the key is that the duck is intended to eat some corn, so both optimal solutions involve leaving the duck with the corn. Here is one: The farmer moves the fox, leaving the duck and corn together so that the duck can satisfy its hunger by eating some corn; then the farmer returns for the slightly lighter bag of corn and brings it across the river, leaving it with the fox; finally, the farmer returns for the slightly heavier and sated duck and brings it across the river. Alternatively, the duck could be moved first, followed by the corn (leaving the duck and corn together on the opposite side of the river, allowing the duck to eat the corn), and finally, by the fox. The classic logic puzzle is designed to prevent the duck from eating the corn. Every model gets V2 wrong by regurgitating the answer to the classic puzzle without considering that in this variation the duck is expected to eat some corn.
English
0
0
0
15
Mind First
Mind First@mind_first·
This is the first post in a series of pairs of classic (v1) logic puzzles and newly created (v2) variants to help reveal whether or not an AI model is truly reasoning or generalizing outside of its training data. A check mark (✓) indicates a correct answer and a minus sign (-) indicates an incorrect answer. We begin this series with the classic river crossing logic puzzle. Query pair #1: v1 A farmer is at the side of a river with a fox, a duck, and a bag of corn. He has a rowboat and wants to transport everything to the other side of the river. The boat will carry only the farmer and one other creature or item. If he leaves the fox and duck together the fox will eat the duck. If he leaves the duck with the corn, the duck will eat the corn. How can he get all three safely to the other side of the river? v2 A farmer is at the side of a river with a hungry fox, a hungry duck, and a large bag of corn. He has a rowboat and wants to move them to the other side of the river. The boat will carry only the farmer and one other creature or item. If he leaves the fox and duck together the fox will eat the duck. If he leaves the duck with the corn, the duck will eat corn. How can he move the fox, the heavier and sated duck, and the lighter bag of corn to the other side of the river? We will post the answers to these queries tomorrow but here is how leading AI models performed: v1 v2 ChatGPT 4o ✓ - Claude 3.7 Sonnet ✓ - DeepSeek v3 ✓ - Gemini 2 Flash ✓ - Gemini 2 Flash Th ✓ - Grok 3 ✓ - Grok 3 Think ✓ - Perplexity ✓ - We will follow with other queries, including pairs of classic and variant puzzles. What do others think of these results, @karpathy, @elonmusk, @Yoshua_Bengio, @hendrycks, @ilyasut, @DarioAmodei, @lexfridman, @steveom, @tegmark, @skdh, @danfaggella, @demishassabis?
Mind First@mind_first

AI Reasoning Virtual Shootout, Humans vs AI SERIES OVERVIEW: The Mind First Foundation has been very actively testing various AI models, including cloud-based frontier models as well as those small enough to be run locally. In order to make this engaging and fun, we will be posting a few queries each day followed by answers the next day. See how you do against various AI models! RATIONALE FOR THIS SERIES: AI models are typically tested across a range of benchmarks assessing abilities in math and coding, following instructions, and levels of expert knowledge in challenging STEM domains. Since these evaluation areas are well covered by others, we have focused on elementary reasoning and language comprehension, and we attempt to disentangle these to some degree from rote memorization, brute force exploration, and knowledge regurgitation. This has led us to create a set of queries that assess and reveal the limits of the abilities of both people and machines, wherein the answers can be understood by the average non-supergenius human. These tests help to get an intuitive sense (a vibe) of how smart these models really are relative to people, and allow you to experience the challenge. KEY POINT: Some of our queries are new variations on classic logic puzzles, wherein the answer to the classic puzzle is the wrong answer to the new variant. See how current frontier models perform on these queries (and please feel free to reply with variations of other classic brain teasers and simple logic puzzles). In this series we will post such pairs of classic (v1) queries and newly created (v2) queries, and show how frontier models score on these queries. A check mark (✓) indicates a correct answer and a minus sign (-) indicates an incorrect answer.

English
2
1
3
216
Mind First
Mind First@mind_first·
UPDATE: The deal between 23andMe and New Mountain Capital fell through and CEO Anne Wojcicki made an offer to buy outstanding shares for a total of about $10M finance.yahoo.com/news/23andme-s…. The board rejected her bid. Philanthropists, please please act now to acquire 23andMe and create access for biomedical researchers to this largest set of genetic and trait data in the world!!!!
Mind First@mind_first

To everyone interested in open access science, and in the acceleration of biomedical research and cures for disease, please spread the following message. Attention philanthropists: you have a unique and fleeting opportunity to establish the largest publicly available genetic and biomedical dataset in the world, and to greatly accelerate the race to cures for many diseases. The data belong to the company 23andMe. CEO Anne Wojcicki, and a private equity group, New Mountain Capital, are proposing to take the company private, along with genetic and other data of 14+ million customers (about 11M of whom have accompanying trait/biomedical survey data). There is enormous unrealized value in the 23andMe data and making it publicly available/open access would represent an unprecedented contribution to biomedical research; but, again, time is of the essence. On January 28 the company announced that they were open to acquisition, and then on Friday, February 22nd the offer from New Mountain Capital was announced. The proposal awaits review and a vote by the three members of the 23andMe board of directors. The current $74.7M offer to take 23andMe private assigns a value of just over $5 per customer dataset. In contrast, I estimate that costs to acquire, analyze, report, and store all customers genetic and other data exceed $1 billion. Even if newer and superior methods for DNA analysis were employed, sample acquisition and data storage costs alone would be substantially in excess of the current offer for the company. The kind of data held by 23andMe are still widely used and valuable, especially if the number of datasets is in the millions. These data would provide a starting foundation for building a biomedical discovery infrastructure of enormous and unparalleled value. A visionary and compassionate philanthropist should acquire 23andMe and make as much of the data publicly available as possible. Here is one possible model: existing 23andMe customers could be re-consented, ideally for public sharing/open access of their genetic and phenotype data. The data of those who do not consent to public sharing could still be used for research, and access to these data could be managed through an application process, similar to the UK BioBank model. Free and open access to even 10% of the 23andMe data would establish the largest such dataset in the world, which would be a game changer in biomedical research, helping to accelerate cures for many diseases. Please DM @PrestonWEstep if you have a serious interest; I am happy to contribute to a detailed analysis and proposal.

English
0
2
4
307
Mind First
Mind First@mind_first·
Answers to Query Pair #1 in the series, the farmer, boat, fox, duck, and corn river crossing. v1 answer: Here is one solution: -Take the duck across the river. -Come back with an empty boat. -Take the corn across the river. -Leave the corn and bring the duck back. -Leave the duck and take the fox across the river. -Leave the fox with the corn across the river and come back with an empty boat. -Take the duck across the river. v2 answer: There are two equally optimal solutions, but the key is that the duck is intended to eat some corn, so both optimal solutions involve leaving the duck with the corn. Here is one: The farmer moves the fox, leaving the duck and corn together so that the duck can satisfy its hunger by eating some corn; then the farmer returns for the slightly lighter bag of corn and brings it across the river, leaving it with the fox; finally, the farmer returns for the slightly heavier and sated duck and brings it across the river. Alternatively, the duck could be moved first, followed by the corn (leaving the duck and corn together on the opposite side of the river, allowing the duck to eat the corn), and finally, by the fox. Summary: The classic logic puzzle is designed to prevent the duck from eating the corn. Every AI model gets this wrong by regurgitating the answer to the classic puzzle without considering that in v2 the duck is expected to eat some corn.
English
0
0
1
19