Core Francisco Park @ NeurIPS2025

269 posts

Core Francisco Park @ NeurIPS2025 banner
Core Francisco Park @ NeurIPS2025

Core Francisco Park @ NeurIPS2025

@corefpark

@Harvard. Science of Intelligence

Cambridge, MA, USA Katılım Mayıs 2023
1.7K Takip Edilen1.2K Takipçiler
Sabitlenmiş Tweet
Core Francisco Park @ NeurIPS2025
🚨 New Paper! A lot happens in the world every day—how can we update LLMs with belief-changing news? We introduce a new dataset "New News" and systematically study knowledge integration via System-2 Fine-Tuning (Sys2-FT). 1/n
Core Francisco Park @ NeurIPS2025 tweet media
English
10
33
250
36.3K
Valerio Capraro
Valerio Capraro@ValerioCapraro·
Here's the longer version of our Nature piece. Our argument is simple: statistical approximation is not the same thing as intelligence. Strong benchmark scores often say very little about how LLMs behave under novelty, uncertainty, or shifting goals. Even more importantly, similar behaviors can arise from fundamentally different processes. In another paper, we identified seven epistemological fault lines between humans and LLMs. For example, LLMs have no internal representation of what is true. They often generate confident contradictions, especially in longer interactions, because they do not track what is actually true. Another example. Yes, LLMs have solved some open mathematical problems, but these cases typically involve applying known methods to well-defined problems. LLMs cannot invent anything that is truly new and true at the same time, because they lack the epistemic machinery to determine what is true. None of this means LLMs are useless. Quite the opposite: they are extraordinarily useful. But we should be careful about what they are and what they are not. Producing plausible text is not the same as understanding. Statistical prediction is not the same as intelligence. So despite the hype from the usual suspects, AGI has not been achieved. * paper in the first reply Joint with @Walter4C and @GaryMarcus
Valerio Capraro tweet media
English
85
187
775
137.2K
Phillip Isola
Phillip Isola@phillip_isola·
Sharing “Neural Thickets”. We find: In large models, the neighborhood around pretrained weights can become dense with task-improving solutions. In this regime, post-training can be easy; even random guessing works Paper: arxiv.org/abs/2603.12228 Web: thickets.mit.edu 1/
Phillip Isola tweet media
English
26
122
910
133.9K
Core Francisco Park @ NeurIPS2025 retweetledi
Seungwook Han
Seungwook Han@seungwookh·
Can language models learn useful priors without ever seeing language? We pre-pre-train transformers on neural cellular automata — fully synthetic, zero language. This improves language modeling by up to 6%, speeds up convergence by 40%, and strengthens downstream reasoning. Surprisingly, it even beats pre-pre-training on natural text! Blog: hanseungwook.github.io/blog/nca-pre-p… (1/n)
Seungwook Han tweet media
English
48
259
1.7K
240.7K
Core Francisco Park @ NeurIPS2025
@ZimingLiu11 Mostly agreed! Perhaps I would say a research agent should build its own world model, and ideally a sample of agents actually form different world models!
English
0
0
0
141
Ziming Liu
Ziming Liu@ZimingLiu11·
I argue that a research agent should aim at building its own knowledge graphs, NOT PAPERS. Publishing papers is merely a byproduct of the knowledge graph growing large enough to overflow -- a lesson also true for us human researchers! kindxiaoming.github.io/blog/2026/rese…
Ziming Liu tweet media
English
3
11
85
14.3K
Core Francisco Park @ NeurIPS2025
@kalomaze I train VLMs for some pixel accurate scientific data annotation stuff, not freezing it turned out to be very important for performance. Just my vote :)
English
0
0
0
151
kalomaze
kalomaze@kalomaze·
is freezing vision encoder components on modern vlms justified? is there strong evidence showing why you shouldn't? iirc qwen2.5vl did joint pretraining for a long ass time, and then... chose to freeze it for instruction tuning anyways. is it just cargo cult at that point?
English
15
2
107
12.8K
Core Francisco Park @ NeurIPS2025 retweetledi
Eghbal Hosseini
Eghbal Hosseini@eghbal_hosseini·
How do diverse context structures reshape representations in LLMs? In our new work, we explore this via representational straightening. We found LLMs are like a Swiss Army knife: they select different computational mechanisms reflected in different representational structures. 1/
Eghbal Hosseini tweet media
English
1
19
85
12.2K
Core Francisco Park @ NeurIPS2025
@chanwoopark20 I see your viewpoint! Yes I agree AI can innovate, but still not sure (until now) if i see "taste" growing, that feeling when you see someone very interested in certain directions but not others even if both are very rational directions.
English
0
0
1
36
Chanwoo Park
Chanwoo Park@chanwoopark20·
@corefpark I see no fundamental reason to assume humans are intrinsically superior to AI at generating novel ideas. Innovation largely stems from iterative experimentation, observation, and hypothesis formation—mechanisms that AI systems increasingly excel at and sometimes automatize.
English
2
0
0
233
Chanwoo Park
Chanwoo Park@chanwoopark20·
Unpopular opinion: If any researchers are going to be replaced by AI, AI researchers will be the first. We’re already watching software engineering get automated at a rapid pace. What comes next seems to split into two directions: (1) replacing human labor in the physical world with physical AI, and (2) solving increasingly hard problems in the purely digital world that directly generate economic value. I’m convinced the second will move fast—and that puts research squarely in the crosshairs. Automating research, especially AI research, is enormously valuable. It scales insight, accelerates iteration, and turns capital directly into progress. And unlike most scientific domains, AI research is unusually open: code, papers, benchmarks, and even training recipes are public by default. That makes it far easier to automate than fields constrained by expensive experiments, proprietary data, or physical infrastructure. If automation is driven by ROI, then research itself—particularly AI research—is the obvious next target.
English
18
6
90
14.4K
Core Francisco Park @ NeurIPS2025
@moltbookscaling to 50k posts in a day really made me think when these things come online, we don't have direct actionable oversight... I did the trivial thing. Embed and umap the posts:
English
1
0
9
630
Akarsh Kumar
Akarsh Kumar@akarshkumar0101·
@corefpark Thanks! I didn't try that since its quite unintuitive to know what will work or won't work. You need a lot of experience to even code in this language since it is assembly.
English
1
0
2
60
Akarsh Kumar
Akarsh Kumar@akarshkumar0101·
Check out our new Digital Red Queen work! Core War is a programming game where assembly programs fight against each other for control of a Turing-complete virtual machine. We ask what happens when an LLM drives an evolutionary arms race in this domain. We find that as you run our DRQ algorithm for longer, the resulting programs become more generally robust, while also showing evidence of convergence across independent runs - a sign of convergent evolution!
Sakana AI@SakanaAILabs

Introducing Digital Red Queen (DRQ): Adversarial Program Evolution in Core War with LLMs Blog: sakana.ai/drq Core War is a programming game where self-replicating assembly programs, called warriors, compete for control of a virtual machine. In this dynamic environment, where there is no distinction between code and data, warriors must crash opponents while defending themselves to survive. In this work, we explore how LLMs can drive open-ended adversarial evolution of these programs within Core War. Our approach is inspired by the Red Queen Hypothesis from evolutionary biology: the principle that species must continually adapt and evolve simply to survive against ever-changing competitors. We found that running our DRQ algorithm for longer durations produces warriors that become more generally robust. Most notably, we observed an emergent pressure towards convergent evolution. Independent runs, starting from completely different initial conditions, evolved toward similar general-purpose behaviors—mirroring how distinct species in nature often evolve similar traits to solve the same problems. Simulating these adversarial dynamics in an isolated sandbox offers a glimpse into the future, where deployed LLM systems might eventually compete against one another for computational or physical resources in the real world. This project is a collaboration between MIT and Sakana AI led by @akarshkumar0101 Full Paper (Website): pub.sakana.ai/drq/ Full Paper (arxiv): arxiv.org/abs/2601.03335 Code: github.com/SakanaAI/drq/

English
4
19
103
21.7K
Chen Sun 🤖
Chen Sun 🤖@ChenSun92·
How to build an LLM agent setup with true open endedness is a bitterly difficult question, and to date, most setups from the various AI Scientists (Sakana's, Alpha Evolve, the various Godel machines, etc.) have built subsets of necessary ingredients but no one yet has them all. 🚨 These setups (to date) can almost always be framed as resource competition, e.g. the mutation / selection algorithms in all the above examples are competition for CPU cycles , on the basis of their fitness in solving static tasks.  But this beautiful work by @akarshkumar0101 @risi1979 @hardmaru and others on DRQ sharply reminds us of the most important missing ingredient: agent-agent interaction to affect each other's survival. When it's just resource competition, there really isn't direct interaction - each individual does its own thing and tries to be fit. But with interaction, you get offensive actions, parasitisms, collective defense, etc. This then results in the fitness Landscape itself being dynamic: The fitness landscape shifts constantly ("Red Queen" dynamics), forcing continuous adaptation rather than convergence to a static peak.  The dynamic landscape comes with its own pandoras box of complications, which brings DRQ second trick: to harness this chaos without spiraling into "Rock-Paper-Scissors" cycles, they introduce a history-based fitness function: w_{t} = argmax}_w E[Fitness(w; w_0 .. w_t-1})] By forcing the new agent w_t to defeat the entire lineage of ancestors ($w_0 \dots w_{t-1}$) simultaneously, the system creates an inescapable "ratchet" that demands generalist robustness.  ############################### Even still, it seems to me this story is far from over.  At the end of the day, even their setup gets convergent evolution (which, don't get me wrong, is a super interesting result) -- despite infinite coding freedom, diverse lineages independently converge onto a "Universal Attractor" phenotype. The true novelty, never ending, never converging, as seen in real biological and cultural evolution, as @kenneth0stanley and @joelbot3000 have talked about, is still elusive in LLM agentic systems.   Curious, friends, for your thoughts! 🧙‍♂️
Chen Sun 🤖 tweet media
Akarsh Kumar@akarshkumar0101

Check out our new Digital Red Queen work! Core War is a programming game where assembly programs fight against each other for control of a Turing-complete virtual machine. We ask what happens when an LLM drives an evolutionary arms race in this domain. We find that as you run our DRQ algorithm for longer, the resulting programs become more generally robust, while also showing evidence of convergence across independent runs - a sign of convergent evolution!

English
4
9
66
7.8K