Arnim Bleier

470 posts

Arnim Bleier banner
Arnim Bleier

Arnim Bleier

@arnimb

Computational Social Science and #reproducibility @gesis_org 🐘 @[email protected]. Opinions my own!

Honolulu, HI Katılım Mayıs 2009
302 Takip Edilen279 Takipçiler
Arnim Bleier retweetledi
Iván Arcuschin
Iván Arcuschin@IvanArcus·
You change one word on a loan application: the religion. The LLM rejects it. Change it back? Approved. The model never mentions religion. It just frames the same debt ratio differently to justify opposite decisions. We built a pipeline to find these hidden biases 🧵1/13
Iván Arcuschin tweet media
English
241
1.9K
12.8K
870.2K
Arnim Bleier retweetledi
Alex Cui
Alex Cui@alexcdot·
Okay so, we just found that over 50 papers published at @Neurips 2025 have AI hallucinations I don't think people realize how bad the slop is right now It's not just that researchers from @GoogleDeepMind, @Meta, @MIT, @Cambridge_Uni are using AI - they allowed LLMs to generate hallucinations in their papers and didn't notice at all. It's insane that these made it through peer review👇
Alex Cui tweet media
English
279
1.4K
6.4K
993.7K
Arnim Bleier retweetledi
John Horton
John Horton@johnjhorton·
Normally, it's: 1) write a paper & submit 3) get reviews (~3 months) 4) revise paper & resubmit 5) wait for response (~3 months) ...what if we could simulate this process in minutes? Could we fix issues? Anticipate misconceptions? Get ideas for new analyses/experiments? 1/
English
9
35
192
43.1K
Arnim Bleier retweetledi
Kevin Weil 🇺🇸
Kevin Weil 🇺🇸@kevinweil·
💥 I’m starting something new inside OpenAI! It’s called OpenAI for Science, and the goal is to build the next great scientific instrument: an AI-powered platform that accelerates scientific discovery.
English
196
261
3.9K
701.9K
Arnim Bleier retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
It's 2025 and most content is still written for humans instead of LLMs. 99.9% of attention is about to be LLM attention, not human attention. E.g. 99% of libraries still have docs that basically render to some pretty .html static pages assuming a human will click through them. In 2025 the docs should be a single your_project.md text file that is intended to go into the context window of an LLM. Repeat for everything.
English
643
1.3K
12.7K
1.8M
Arnim Bleier retweetledi
Kobi Hackenburg
Kobi Hackenburg@KobiHackenburg·
📈Out today in @PNASNews!📈 In a large pre-registered experiment (n=25,982), we find evidence that scaling the size of LLMs yields sharply diminishing persuasive returns for static political messages.  🧵:
Kobi Hackenburg tweet mediaKobi Hackenburg tweet media
English
6
34
128
35.1K
Arnim Bleier retweetledi
Paul Röttger
Paul Röttger@paul_rottger·
Are LLMs biased when they write about political issues? We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before. Long 🧵with spicy results 👇
Paul Röttger tweet media
English
3
34
205
29.3K
Arnim Bleier retweetledi
Niklas Muennighoff
Niklas Muennighoff@Muennighoff·
Last week we released s1 - our simple recipe for sample-efficient reasoning & test-time scaling. We’re releasing 𝐬𝟏.𝟏 trained on the 𝐬𝐚𝐦𝐞 𝟏𝐊 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬 but performing much better by using r1 instead of Gemini traces. 60% on AIME25 I. Details in 🧵1/9
Niklas Muennighoff tweet media
Niklas Muennighoff@Muennighoff

DeepSeek r1 is exciting but misses OpenAI’s test-time scaling plot and needs lots of data. We introduce s1 reproducing o1-preview scaling & performance with just 1K samples & a simple test-time intervention. 📜arxiv.org/abs/2501.19393

English
22
112
764
154.8K
Arnim Bleier retweetledi
Steve Newman
Steve Newman@snewmanpv·
Clearly someone needs to try this at scale – pick 1000 published scientific papers at random, ask o1 or o1-pro to look for errors, and see what turns up. I'm going to give it a shot. Anyone interested in helping out? (Incidentally, h/t @gibbnicholas for also noticing that o1-pro can spot the math error in the black plastics paper: x.com/gibbnicholas/s…)
Ethan Mollick@emollick

👀 A 10 page paper caused a panic because of a math error. I was curious if AI would spot the error by just prompting: “carefully check the math in this paper” especially as the info is not in training data. o1 gets it in a single shot. Should AI checks be standard in science?

English
92
97
1K
641.1K
Arnim Bleier retweetledi
Jo(sephine) Lukito
Jo(sephine) Lukito@JosephineLukito·
📰 Another day, another resource (though this one is more a draft): a list of #polcomm and computational social science conferences (e.g., comm, journal, po, polisci, css, hci). Feel free to share with colleagues/students/classes. docs.google.com/spreadsheets/d…
Jo(sephine) Lukito tweet media
English
8
31
113
10.8K
Arnim Bleier retweetledi
Jenny Wong (she/her)
Jenny Wong (she/her)@_jennywong_·
A lot of people don't realise that JupyterHub can do so much more than just serve Jupyter Notebooks – it can do cool stuff like serve Linux desktop applications in the cloud! @ProjectJupyter
2i2c@2i2c_org

We're thrilled to share an update about our continued collaboration with @developmentseed on the @NASA Visualization, Exploration and Data Analysis (VEDA) platform! See how we've made it easier for researchers to explore large geospatial datasets 🌍 2i2c.org/blog/2024/veda…

English
0
9
15
3.1K
Arnim Bleier retweetledi
Max Welling
Max Welling@wellingmax·
I sooo much agree with this. Academic jobs require juggling too many balls with barely any support (grant writing, teaching, managing a research group, supervising students and … research). Take two hrs everyday before you look at email and social media. search.app/nnTKNba5Dqv8Eu…
English
4
44
297
33.6K
Arnim Bleier retweetledi
CESSDA ERIC
CESSDA ERIC@CESSDA_Data·
(🧵3/3) Participants engaged in a live demonstration and discussion, exploring the potential of these tools to improve the transparency and replicability of computational social science research. #CESSDAConference2024 #SocialSciences #Humanities
CESSDA ERIC tweet mediaCESSDA ERIC tweet media
English
0
1
4
188