Ham Huang

500 posts

Ham Huang banner
Ham Huang

Ham Huang

@Huang_Ham

PhD student @Princeton Psych under Drs. Natalia Vélez & Tom Griffiths, studying the computational cognition of human aggregate minds. Before @Penn @Cal

Princeton, NJ Katılım Mayıs 2013
457 Takip Edilen328 Takipçiler
Ham Huang retweetledi
Francisco Correia da Cruz
Francisco Correia da Cruz@cruz_fcorreia·
Excited to share a new preprint of my latest paper: "The Social Psychology of the Living and the Dead" In it, we set a research agenda for exploring the social psychology of the dead as a potential lens through which we can learn about the social psychology of the living 1/
English
1
2
2
128
Ham Huang retweetledi
Cognition
Cognition@CognitionJourn·
A computational theory of Aha! moments! New paper explaining why Aha! moments occur and why they feel so good. TLDR: They are a form of meta-cognitive prediction errors i.e. they occur when we surprise ourselves about our own abilities! @cocosci_lab sciencedirect.com/science/articl…
Cognition tweet media
English
0
31
126
6.7K
Ham Huang retweetledi
dinosaur
dinosaur@dinosaurs1969·
dinosaur tweet media
ZXX
229
2.3K
88.6K
1.3M
Ham Huang retweetledi
Robert Anderson
Robert Anderson@ProfRobAnderson·
One of the biggest mistakes I see students making on multiple-choice exams is picking the wrong answer.
English
152
2.6K
28.9K
560K
Ham Huang retweetledi
Dilip Arumugam
Dilip Arumugam@Dilip_Arumugam·
Excited to be at (my first) #ICLR2026 this week to present work with Tom Griffiths (@cocosci_lab) on efficient exploration for LLM agents 🧵
Dilip Arumugam tweet media
English
1
9
57
5.4K
Ham Huang
Ham Huang@Huang_Ham·
@0x45o Two. “Strawperry” has a p in the middle (strawperry), plus the one you’d get if you also counted… wait, no — just the one p. Let me recount: s-t-r-a-w-p-e-r-r-y. One p.
English
0
0
0
74
0x45
0x45@0x45o·
it is confirmed, we reached AGI
0x45 tweet media
English
392
882
20.1K
1.4M
Ham Huang retweetledi
Mohammed Alsobay | محمد الصبي
🚨 Out in Science this week, with @DG_Rand @duncanjwatts and Abdullah Almaatouq 🚨 We apply an integrative approach to a classic question in behavioral econ and cooperation research: *when* does peer punishment help or hinder collective welfare?
Mohammed Alsobay | محمد الصبي tweet media
English
2
30
92
12.9K
Ham Huang
Ham Huang@Huang_Ham·
Apple seems really interested in showing LLM is incapable
Nav Toor@heynavtoor

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English
0
0
1
115
Ham Huang retweetledi
Tadeg Quillien
Tadeg Quillien@TadegQuillien·
Our new paper on causal judgment is now out! Led by Can Konuk, with Salvador Mascarenhas. We study how causes that feature several variables (`A and B caused E') are represented in the human mind. direct.mit.edu/opmi/article/d…
Tadeg Quillien tweet media
English
1
7
17
1.9K
Ham Huang
Ham Huang@Huang_Ham·
I guess you can run the same study with your research assistants…. Just some misplaced trust.
Hedgie@HedgieMarkets

🦔Researchers at the University of Pennsylvania studied what they call cognitive surrender, the tendency to accept AI outputs without critical evaluation. Across 1,372 participants and over 9,500 trials, subjects accepted faulty AI reasoning 73.2% of the time and only overruled it 19.7% of the time. When the AI was wrong, users still accepted its answer 80% of the time. Subjects who used AI scored 11.7% higher on confidence in their answers despite the AI being wrong half the time. Adding time pressure made people 12 percentage points less likely to catch AI errors. Adding financial incentives and immediate feedback made them 19 points more likely to catch them. My Take The time pressure finding matters enormously for how AI is actually being deployed in workplaces. Companies are using AI to justify faster turnaround times, which means employees are using it under exactly the conditions that make them least likely to catch mistakes. When you're rushed, your internal monitor for detecting errors essentially stops firing, so you get AI output, no time to review it, high confidence it's correct, and a meaningful chance it's wrong. People using a system that was wrong half the time still felt more confident in their answers than people who weren't using AI at all. That is a system actively making people worse at knowing what they don't know, which is one of the most dangerous things you can do to human judgment at scale. The companies pushing AI hardest into employee workflows should be reading this research carefully. Hedgie🤗 Link to research for those interested: papers.ssrn.com/sol3/papers.cf…

English
0
0
0
140
Ham Huang retweetledi
Elizabeth Mieczkowski
Elizabeth Mieczkowski@beth_miecz·
🚨New preprint! LLM teams are being deployed at scale, yet we lack the tools to predict when they’ll succeed, fail, or how to design them. Distributed computing faced the exact same questions and figured out how to answer them. We show those insights apply directly to LLMs 🧵👇
Elizabeth Mieczkowski tweet media
English
4
27
110
15.7K
Ham Huang retweetledi
Hanbo Xie
Hanbo Xie@PsychBoyH·
I wrote a blog post about my recent thoughts on the scaling in Computational Cognitive Sciences and AI (CogAI). In this blog, I argue that performance scales more easily than insights by pointing out two bottlenecks. Thoughts and comments are welcome! xhb120633.github.io/blog/performan…
Hanbo Xie tweet media
English
1
9
29
2.1K
Trey Moon ✨
Trey Moon ✨@Treymoon_·
This is why you ALWAYS follow the rules in an MRI room! The magnetic pull is no joke. One mistake and things go south FAST. Watch until the end… 😳
English
245
130
1.9K
1.8M
Ham Huang retweetledi
SaltyAom
SaltyAom@saltyAom·
Why must you hurt me in this way
SaltyAom tweet media
English
142
3K
107.1K
1.8M
Ham Huang retweetledi
Xuechunzi Bai
Xuechunzi Bai@baixuechunzi·
Minimizing existing bias in LLMs is important, but not enough.  As models become more agentic (learning over long horizons from feedback), they can generate *novel* biases not seen in pretraining. More capable models create stronger (!) biases, and these are not easy to fix…
Ryan Liu@theryanliu

LLMs develop novel biases from experience. New preprint: LLMs that make decisions & get feedback develop new views — including ⚠️harmful stereotypes that target demographics! [1/7]

English
0
3
16
3.5K
Ham Huang
Ham Huang@Huang_Ham·
Very exciting paper that touches directly at the heart of some of my recent interests and thinking!!
Valerio Capraro@ValerioCapraro

Now out in Nature Human Behaviour! 🚀🚀 Over the past decades, research on collective human behaviour has relied heavily on networks. This is intuitive: people interact with other people. However, we argue that this dominant framework misses a crucial ingredient. Traditional networks represent agents as nodes and pairwise relations as edges. As a result, they fundamentally assume that social interactions can be decomposed into pairs. Yet many social processes are irreducibly group-based. A simple example: a group of three coauthors writing a paper cannot be reduced to three independent pairs of coauthors. The group itself matters. In this article, we review a wide range of empirical and theoretical cases where group interactions cannot be decomposed into pairwise ones, and show that higher-order interactions shape collective behaviour above and beyond dyadic ties. We advocate studying collective behaviour on hypergraphs, where interactions can involve multiple agents simultaneously. We review how hypergraphs provide new insights across domains, including affiliation and collaboration networks, high-frequency contact settings (families, friends), and key social processes such as social contagion, cooperation, truth-telling, and moral behaviour. Finally, we outline promising directions for future research: addressing computational challenges of higher-order models; studying bias and inequality in group dynamics; combining hypergraphs and large language models to investigate the coevolution of language and behaviour; and using higher-order networks to simulate the impact of policies before implementation; and others. We are very excited about this work and hope it will inspire further research in a rapidly growing and fundamental area with broad real-world implications. Link to the paper in the first reply This work was brilliantly led by Federico Battiston (@fede7j), with an outstanding team of co-authors: Fariba Karimi (@fariba_k), Sune Lehmann, Andrea Bamberg Migliano, Onkar Sadekar (@OnkarSadekar), Angel Sanchez, & Matjaz Perc (@matjazperc)

English
0
0
2
251