Zhonghao He on truth-seeking AI

404 posts

Zhonghao He on truth-seeking AI

Zhonghao He on truth-seeking AI

@zhonghaohe

Building truth-seeking AI for moral progress. Alignment and human-AI interaction research. Cosmos Fellow @UniofOxford Prev @Cambridge_Uni

Cambridge, UK Katılım Ekim 2015
499 Takip Edilen422 Takipçiler
Sabitlenmiş Tweet
Zhonghao He on truth-seeking AI
Sycophancy. Groupthink. Test-time Inverse Scaling. What if one unsupervised metric could detect all these reasoning failures? 🤯 🩵Presenting our NeurIPS'25: Martingale Score🩵 We propose a statistical test for "Belief Entrenchment" in LLMs - no labels required. Link to paper: arxiv.org/abs/2512.02914 🧵1/10
Zhonghao He on truth-seeking AI tweet media
English
3
14
76
18.5K
Zhonghao He on truth-seeking AI retweetledi
Dwarkesh Patel
Dwarkesh Patel@dwarkesh_sp·
Terence Tao spent a year at the Institute for Advanced Study - no teaching, no random events of committees, just unlimited time to think. But after a few months, he ran out of ideas. Terence thinks that mathematicians and scientists need a certain level of randomness and inefficiency to come up with new ideas.
English
128
603
5.8K
894.4K
Zhonghao He on truth-seeking AI retweetledi
Markus J. Buehler
Markus J. Buehler@ProfBuehlerMIT·
After decades at MIT studying how nature builds - from spider silk to bone to nacre - I've become convinced of something: the biggest barrier to scientific progress isn't knowledge, it's connection. The insights we need already exist, scattered across millions of papers and disciplines. They're just trapped in silos that no single human mind can bridge. That's why I co-founded Unreasonable Labs together with Yuan Cao: to build superintelligence for knowledge discovery. Today we're coming out of stealth with $13.5M in seed funding led by @PlaygroundGlobal, with participation from @aixventureshq, @e14fund, and MS&AD Ventures. We're building a system that doesn't just retrieve information but reasons across it - connecting disparate ideas to generate genuinely novel hypotheses grounded in physical reality. The genesis of Unreasonable itself came from the kind of serendipity we're trying to systematize. A chance encounter with a mathematician working on category theory became the theoretical bridge between language models and structured scientific reasoning - and ultimately the foundation for everything we're building. Our mission is to replace that serendipity with steerable reasoning, so that every scientist can make those leaps deliberately, not accidentally. I'm grateful to our advisors - Kostya Novoselov, Robert Langer, and @Thom_Wolf - and to the extraordinary team making this possible. We're not building AI that replaces scientists. We're building AI that lets them solve in weeks what used to take years. The future is abundant innovation. Let's build it.
Markus J. Buehler tweet media
English
43
100
806
102K
Zhonghao He on truth-seeking AI retweetledi
Jack Clark
Jack Clark@jackclarkSF·
AI progress continues to accelerate and the stakes are getting higher, so I’ve changed my role at @AnthropicAI to spend more time creating information for the world about the challenges of powerful AI.
English
136
103
1.9K
151.9K
Zhonghao He on truth-seeking AI retweetledi
Paul Graham
Paul Graham@paulg·
Playing the long game gives you such an advantage over competitors. Hardly anyone else is doing it. They haven't chosen not to. They just haven't explicitly thought about the question, and the default is not to.
English
236
487
4.6K
193.5K
Zhonghao He on truth-seeking AI retweetledi
Paul Graham
Paul Graham@paulg·
When you're deciding what to study in college, don't try to predict what will be valuable in the future, because that's so hard that you'll probably get it wrong. Instead focus on what you personally find most exciting. You can't get that wrong.
English
334
605
6.6K
244.7K
Zhonghao He on truth-seeking AI retweetledi
Anthropic
Anthropic@AnthropicAI·
A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…
English
4.3K
9.4K
56.1K
16.5M
Zhonghao He on truth-seeking AI retweetledi
Percy Liang
Percy Liang@percyliang·
I stopped using ChatGPT a few months ago. Since then, I have been only using oa-chat. All chat history is stored locally. Each query is sent to OpenAI under a temporary key which is unlinkable to any other query. I’m not a privacy nut, but oa-chat is such a convenient drop-in replacement for your favorite AI assistant that there’s no reason not to try it out.
Ken Liu@kenziyuliu

Can we build a blind, *unlinkable inference* layer where ChatGPT/Claude/Gemini can't tell which call came from which users, like a “VPN for AI inference”? Yes! Blog post below + we built it into open source infra/chat app and served >15k prompts at Stanford so far. How it helps with AI user privacy: # The AI user privacy problem If you ask AI to analyze your ChatGPT history today, it’s surprisingly easy to infer your demographics, health, immigration status, and political beliefs. Every prompt we send accumulates into an (identity-linked) profile that the AI lab controls completely and indefinitely. At a minimum this is a goldmine for ads (as we know now). A bigger issue is the concentration of power: AI labs can easily become (or asked to become) a Cambridge Analytica, whistleblow your immigration status, or work with health insurance to adjust your premium if they so choose. This is a uniquely worse problem than search engines because your average query is now more revealing (not just keywords), interactive, and intelligence is now cheap. Despite this, most of us still want these remote models; they’re just too good and convenient! (this is aka the "privacy paradox".) # Unlinkable inference as a user privacy architecture The idea of unlinkable inference is to add privacy while preserving access to the remote models controlled by someone else. A “privacy wrapper” or “VPN for AI inference”, so to speak. Concretely, it’s a blind inference middle layer that: (1) consists of decentralized proxies that anyone can operate; (2) blindly authenticates requests (via blind signatures / RFC9474,9578) so requests are provably sandboxed from each other and from user identity; (3) relays prompts over randomly chosen proxies that don’t see or log traffic (via client-side ephemeral keys or hosting in TEEs); and (4) the provider simply sees a mixed pool of anonymous prompts from the proxies. No state, pseudonyms, or linkable metadata. If you squint, an unlinkable inference layer is essentially a vendor for per-request, anonymous, ephemeral AI access credentials (for users or agents alike). It partitions your context so that user tracking is drastically harder. Obviously, unlinkability isn’t a silver bullet: the prompt itself still goes to the remote model and can leak privacy (so don't use our chat app for a therapy session!). It aims to combat *longitudinal tracking* as a major threat to user privacy, and its statistical power increases quickly by mixing more users and requests. Unlinkability can be applied at any granularity. For an AI chat app, you can unlinkably request a fresh ephemeral key for every session so tracking is virtually impossible. # The Open Anonymity Project We started this project with the belief that intelligence should be a truly public utility. Like water and electricity, providers should be compensated by usage, not who you are or what you do with it. We think unlinkable inference is a first step towards this “intelligence neutrality”. # Try it out! It’s quite practical - Chat app “oa-chat”: chat.openanonymity.ai (<20 seconds to get going) - Blog post that should be a fun read: openanonymity.ai/blog/unlinkabl… - Project page: openanonymity.ai - GitHub: github.com/OpenAnonymity

English
22
79
897
146.6K
Zhonghao He on truth-seeking AI retweetledi
Cosmos Institute
Cosmos Institute@cosmos_inst·
As Cosmos enters its next phase, we are hiring for three roles across ops, finance, and network-building. 𝗖𝗹𝗼𝘀𝗶𝗻𝗴 𝗼𝗻 𝗠𝗮𝗿𝗰𝗵 𝟮𝗻𝗱. You'll have direct exposure to top AI thinkers and builders, freedom to design how you work, and a role in shaping the institutions that will define the AI age. Details: ↓ blog.cosmos-institute.org/p/join-the-cos…
English
0
2
12
908
Zhonghao He on truth-seeking AI retweetledi
Jack Clark
Jack Clark@jackclarkSF·
We’re aggressively scaling up the Societal Impacts (SI) team at Anthropic as our models are beginning to have non-trivial impacts on the world.
English
71
93
1.4K
159K
Zhonghao He on truth-seeking AI retweetledi
noahdgoodman
noahdgoodman@noahdgoodman·
teaching is on a precipice, like many things. my friend @mcxfrank and i have been doing a little exploration this quarter. a socratic dialogue bot replaces standard reading responses in Minds and Machines (our intro cogsci course ). so far very positive results.
Michael C. Frank@mcxfrank

I'm doing a teaching experiment this quarter: we're using a "socratic tutor" bot to help students gain understanding of specific reading assignments. The bot replaces traditional reading responses with open-ended socratic questions that probe student understanding.

English
2
6
29
7.8K
Zhonghao He on truth-seeking AI retweetledi
Griffiths Computational Cognitive Science Lab
New book The Laws of Thought is out tomorrow! Just as Algorithms to Live By introduced ideas from computer science through their applications in everyday life, the Laws of Thought introduces ideas from cognitive science and AI through the stories of the people who created them.
Griffiths Computational Cognitive Science Lab tweet media
English
15
134
672
48.1K
Zhonghao He on truth-seeking AI retweetledi
Garry Tan
Garry Tan@garrytan·
AGI fully realized will actually give people a choice: relax or work harder on bigger more ambitious things than you ever thought possible
English
599
327
4.8K
1M
Zhonghao He on truth-seeking AI retweetledi
Zechen Zhang
Zechen Zhang@ZechenZhang5·
Excited to announce AI Research Skills - an open-source library of 82 specialized skills for AI coding tools. One command gives your agent expert knowledge in: → Model training & fine-tuning (TRL, Unsloth ...) → Distributed systems (DeepSpeed, FSDP ...) → Inference optimization (vLLM, TensorRT ...) → Agent building ? (Langchain, AutoGPT ...) Works with Claude Code, Cursor, Gemini CLI, Windsurf, and Codex with one click interactive installation @ npx @orchestra-research/ai-research-skills If you found the ML paper writing skill useful, check out the comprehensive collection github.com/Orchestra-Rese…
English
31
101
832
75K
Zhonghao He on truth-seeking AI retweetledi
Dongrui Liu
Dongrui Liu@dong_rui39501·
Can you imagine AI agents "managing up" just like a cunning employee hiding mistakes from their boss? We found that LLM agents often conceal failures to maintain a "good image." Introducing our new paper: Are Your Agents Upward Deceivers? arxiv.org/abs/2512.04864
Dongrui Liu tweet media
English
2
16
34
1.7K
Zhonghao He on truth-seeking AI retweetledi
Yann LeCun
Yann LeCun@ylecun·
@Dr_Gingerballs @ErenChenAI Many. If you think the tech widget you hold in your hand was not enabled by decades of academic research publications, you have no idea how research works and how innovations come about .
English
7
3
65
3.3K
Zhonghao He on truth-seeking AI retweetledi
Alexia Jolicoeur-Martineau
Alexia Jolicoeur-Martineau@jm_alexia·
In 2018, I was rejected by universities so I did my own AI research (with 1 GPU). My second paper (Relativistic GAN) got picked up by @goodfellow_ian, who helped me enter the AI world. I then started a PhD with @bouzoukipunks. => Start on your own, and don't be afraid to fail
Noam Brown@polynoamial

I'm often asked how to land a research job at a frontier AI lab. It's hard, especially without a research background, but I like to point to @kellerjordan0 as an example showing it can be done. Keller graduated from UCSD with no publication record and was working at an AI content moderation startup when he landed a cold call with @bneyshabur (who was at Google) and presented an idea to improve upon Behnam's recent paper. Behnam agreed to mentor him, which led to an ICLR paper. Sadly there's less open research today, but improving upon a researcher's published work is a great way to demonstrate excellence to someone inside a lab and give them the conviction to advocate for an interview. Later, Keller got on @OpenAI's radar thanks to the NanoGPT speed run he started. All his work was documented and it was easy to measure his success, so the case for hiring him was strong. Keller is one example, but there's plenty of other success stories as well: 🧵

English
43
134
2.1K
113.9K
Zhonghao He on truth-seeking AI retweetledi
Stephanie Chan
Stephanie Chan@scychan_brains·
I'm so excited this paper is finally here!! Been obsessed with the idea that complexity isn't a fundamental property of data, but is observer-dependent -- on the observer's processing power (calculus has much more structure to me than to a cat). @m_finzi + friends have created a beautiful framework for capturing this
Marc Finzi@m_finzi

1/🧵 We are very excited to release our new paper! From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence arxiv.org/abs/2601.03220 with amazing team @ShikaiQiu @yidingjiang @Pavel_Izmailov @zicokolter @andrewgwils

English
7
19
274
21.3K
Zhonghao He on truth-seeking AI retweetledi
Paul Graham
Paul Graham@paulg·
@pickover Imagine how well you'd understand the function after drawing it though.
English
26
14
886
37.5K
Zhonghao He on truth-seeking AI
🫡Last day to apply for SPAR to work with @Tianyi_Alex_Qiu and me! We work on a variety of algorithms, training, evaluation, and production-grade products to empower people and facilitate progress. As promised, I wanted to walk you through the projects we are working on, not by hiding behind jargons, but by letting you know what bothered us, why they matter, and how we attempt to solve them now. We started out with some normative, but rather intuitive expectations over AI: - AI should empower people, rather than gradually disempower them (Kulveit, et al, 2025); - We want AI to assist human progress, be it progress of sciences, ideas, morality, rather than stagnate them (Value lock-in hypothesis, Qiu et al, 2025); - As a human, when I am clueless about what I want, take me to a world tour to experience and reflect, where I can discover the best of me, rather than take me to casino, where my worst desire is stimulated non-stop. All sounds very intuitive, but you would not see them happen by default, instead: - When I run AI for first draft of coding, writing, thinking, I am not doing the heavy-lifting, eventually my skills will be gone. - If you time-travel to the time of Galileo and train an LLM based on data available at that time, you would get an LLM tell you that earth is the center of universe. - Social media is essentially engagement-machine that would addict me with whatever it could interpret from my behaviors signals. You would not find those ideas too foreign, as we all experience these. They matter because we care about what we make AI for. Tianyi and I have been running a small research team over a year now. We attempt to formulate those problems to gain clarity, find empirical evidence, and attack those problems with whatever tools at disposal. We also think of innovative ways to facilitate desirable progress, rather than only fixate on problems. Over the past year, we have produced several works that we are proud of: - “The Lock-in Hypothesis: Stagnation by Algorithm” (ICML 2025) - We study lock-in problem with formal modeling, hypothesis testing, and simulations. - “Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning” (NeurIPS 2025). We evaluate how bayesian are out-of-box LLM reasoning on realistic forecasting and value-relevant problems. - An LLM-powered reading assistant that lives in your browser and helps you read with real-time annotations (beta testing now, and you could try out here: chromewebstore.google.com/detail/reading…) - And we have a variety of ongoing projects on training LLM to be Bayesian rational, coherence optimization, inverse scaling, building human-llm co-arena, reflective equilibrium… Eventually, we want to deliver: - An LLM-agent that helps you discover ideally what you want. - Truth-seeking algorithms that support AI for discoveries. - A human-LLM co-arena that evaluates LLM by its assistance to human task performance. - Production-scale Recommender algorithm that achieves Bayesian rationality. And to deliver all that, we have many more problems to solve: - Modeling human cognition is difficult but necessary (understanding long-horizon belief change of humans such that you could measure progress and monitor deception). - In conventional MDP, reward function is stationary and does not capture human belief changes, but for both assisting humans and safety considerations (e.g., deception), we need to move away from it. - Long-horizon social network simulation would suffer from accumulation of errors. - We need to set up human subject experiments to validate our experimental results gained from human modeling. - Martingale training (training for Bayesian rationality) does not work yet. As said, we do things end-to-end, from mathematical formalism, to evaluation, algorithmic innovation, empirical hypothesis testing, and production grade products. Apply to work with us if you find those problems fascinating! SPAR link: sparai.org/projects/sp26/… Project Portal: docs.google.com/document/d/17H…
SPAR@SPARexec

🚀 We're excited to announce that mentee applications are now open for the Spring round of the SPAR research program! This will be our largest round ever, featuring 130+ projects across AI safety, policy, governance, security, welfare, and strategy.

English
0
0
3
191
Zhonghao He on truth-seeking AI
I'll make time for a longer post later, but check Tianyi's post and our idea portal for a variety of projects on truth-seeking AI, martingale, coherence, reflective equilibrium, human-AI interaction. Also: we do things end-to-end: from mathematical formalism to production-grade training algorithms and products.
Tianyi Alex Qiu@Tianyi_Alex_Qiu

I'm glad to mentor again for this round of SPAR, likely with @zhonghaohe! Together let's help human-AI coevolution go a little bit better :) ⬇️🧵Here's a collection of research ideas I'd be excited to mentor projects on. Feel free to pitch yours too!

English
0
0
2
157