Julia Kempe

209 posts

Julia Kempe banner
Julia Kempe

Julia Kempe

@KempeLab

Silver Professor at NYU Courant and CDS, Research Scientist at FAIR Research in Machine Learning, past in Quantum Computing & Finance. Posts my own.

Beigetreten Nisan 2024
190 Folgt2.2K Follower
Julia Kempe retweetet
Noam Brown
Noam Brown@polynoamial·
@HarvardMath AI isn’t replacing mathematicians today, but it is changing mathematics:
Noam Brown tweet media
English
3
7
145
6.4K
Harvard Department of Mathematics
"The verdict, it seems, is in: artificial intelligence is not about to replace mathematicians. That is the immediate takeaway from the “First Proof” challenge—perhaps the most robust test yet of the ability of LLMs to perform mathematical research." scientificamerican.com/article/first-…
English
20
55
213
106.8K
Julia Kempe
Julia Kempe@KempeLab·
7/ Recommendations (2/2): Build an audit registry. Machine-led literature search + AI-written papers = citation risk. We need a public “verification traces” layer (arXiv-adjacent): papers accumulate audit logs/certificates (models dissect/rewrite/check).
English
1
0
2
449
Julia Kempe
Julia Kempe@KempeLab·
11/ Again, we thank the authors for creating such a fascinating testbed — and we genuinely appreciate careful checks and independent audits from the community. Mohammed Abouzaid Andrew J. Blumberg @MartinHairer Joe Kileel @TammyKolda @nick_sriv Paul D. Nelson Daniel Spielman
English
2
0
6
862
Julia Kempe
Julia Kempe@KempeLab·
7/ *Third — interlude: “Humor from your bot.”* We found all bots prone to shortcuts and laziness. “Let’s wait until Feb 13 to see the proof” was one of the most frequently proposed options.
English
1
1
13
979
Julia Kempe
Julia Kempe@KempeLab·
6/ *Second:* During literature searches, the models surfaced very recent references that appeared to prove key theorems they needed. Several turned out to be AI-generated papers. Lesson: literature search is acquiring a new level of difficulty in the age of AI slop.
English
1
0
10
755
Julia Kempe
Julia Kempe@KempeLab·
5/ After working with our AI “scientist fleet,” we want to share a few takeaways. *First:* Even top frontier models produced many false proof manuscripts. Only careful back-and-forth auditing between models exposed the bugs. We hope we got them all!
English
1
0
5
794