Arnim Bleier

470 posts

Arnim Bleier

@arnimb

Computational Social Science and #reproducibility @gesis_org 🐘 @[email protected]. Opinions my own!

Honolulu, HI Katılım Mayıs 2009

302 Takip Edilen279 Takipçiler

Arnim Bleier retweetledi

Iván Arcuschin@IvanArcus·11 Şub

You change one word on a loan application: the religion. The LLM rejects it. Change it back? Approved. The model never mentions religion. It just frames the same debt ratio differently to justify opposite decisions. We built a pipeline to find these hidden biases 🧵1/13

English

241

1.9K

12.8K

870.2K

Arnim Bleier retweetledi

Alex Cui@alexcdot·21 Oca

Okay so, we just found that over 50 papers published at @Neurips 2025 have AI hallucinations I don't think people realize how bad the slop is right now It's not just that researchers from @GoogleDeepMind, @Meta, @MIT, @Cambridge_Uni are using AI - they allowed LLMs to generate hallucinations in their papers and didn't notice at all. It's insane that these made it through peer review👇

English

279

1.4K

6.4K

993.7K

Arnim Bleier retweetledi

John Horton@johnjhorton·2 Eyl

Normally, it's: 1) write a paper & submit 3) get reviews (~3 months) 4) revise paper & resubmit 5) wait for response (~3 months) ...what if we could simulate this process in minutes? Could we fix issues? Anticipate misconceptions? Get ideas for new analyses/experiments? 1/

English

192

43.1K

Arnim Bleier retweetledi

Kevin Weil 🇺🇸@kevinweil·2 Eyl

💥 I’m starting something new inside OpenAI! It’s called OpenAI for Science, and the goal is to build the next great scientific instrument: an AI-powered platform that accelerates scientific discovery.

English

196

261

3.9K

701.9K

Arnim Bleier@arnimb·15 Mar

Scientific work shouldn’t come at the cost of stressful work environments. @DeutscheWelle & @derspiegel investigate abuse at Germany’s #MPG. Just an isolated case?🤔 #OpenScience #ScienceCulture #Abuse youtube.com/watch?v=n5nEd6…

YouTube

English

Arnim Bleier retweetledi

Andrej Karpathy@karpathy·12 Mar

It's 2025 and most content is still written for humans instead of LLMs. 99.9% of attention is about to be LLM attention, not human attention. E.g. 99% of libraries still have docs that basically render to some pretty .html static pages assuming a human will click through them. In 2025 the docs should be a single your_project.md text file that is intended to go into the context window of an LLM. Repeat for everything.

English

643

1.3K

12.7K

1.8M

Arnim Bleier retweetledi

Kobi Hackenburg@KobiHackenburg·7 Mar

📈Out today in @PNASNews!📈 In a large pre-registered experiment (n=25,982), we find evidence that scaling the size of LLMs yields sharply diminishing persuasive returns for static political messages. 🧵:

English

128

35.1K

Arnim Bleier retweetledi

Paul Röttger@paul_rottger·13 Şub

Are LLMs biased when they write about political issues? We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before. Long 🧵with spicy results 👇

English

205

29.3K

Arnim Bleier retweetledi

Niklas Muennighoff@Muennighoff·11 Şub

Last week we released s1 - our simple recipe for sample-efficient reasoning & test-time scaling. We’re releasing 𝐬𝟏.𝟏 trained on the 𝐬𝐚𝐦𝐞 𝟏𝐊 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬 but performing much better by using r1 instead of Gemini traces. 60% on AIME25 I. Details in 🧵1/9

Niklas Muennighoff@Muennighoff

DeepSeek r1 is exciting but misses OpenAI’s test-time scaling plot and needs lots of data. We introduce s1 reproducing o1-preview scaling & performance with just 1K samples & a simple test-time intervention. 📜arxiv.org/abs/2501.19393

English

112

764

154.8K

Arnim Bleier retweetledi

David Duvenaud@DavidDuvenaud·30 Oca

New paper: What happens once AIs make humans obsolete? Even without AIs seeking power, we argue that competitive pressures will fully erode human influence and values. gradual-disempowerment.ai with @jankulveit @raymondadouglas @AmmannNora @degerturann @DavidSKrueger 🧵

English

250

1.3K

399.9K

Arnim Bleier retweetledi

Chris Holdgraf@choldgraf·5 Şub

Big news! We figured out a way to run mybinder.org instances about 5x cheaper, and in a much simpler way. As of today 2i2c.mybinder.org serves about 70% of Binder's sessions, running on a single VM on Hetzner! 2i2c.org/blog/2025/bind…

English

547

Arnim Bleier retweetledi

Steve Newman@snewmanpv·16 Ara

Clearly someone needs to try this at scale – pick 1000 published scientific papers at random, ask o1 or o1-pro to look for errors, and see what turns up. I'm going to give it a shot. Anyone interested in helping out? (Incidentally, h/t @gibbnicholas for also noticing that o1-pro can spot the math error in the black plastics paper: x.com/gibbnicholas/s…)

Ethan Mollick@emollick

👀 A 10 page paper caused a panic because of a math error. I was curious if AI would spot the error by just prompting: “carefully check the math in this paper” especially as the info is not in training data. o1 gets it in a single shot. Should AI checks be standard in science?

English

641.1K

Arnim Bleier retweetledi

forschungsdaten.info@ForschDatenInfo·13 Kas

📢Die #LoveData25 steht vor der Tür! Auch in diesem Jahr bieten wir eine Übersichtsseite an, auf der Veranstaltungen zu #Forschungsdaten und #Forschungsdatenmanagement kompakt zusammengetragen werden. forschungsdaten.info/fdm-im-deutsch… #OpenScience #FDM #RDM

Deutsch

1.1K

Arnim Bleier retweetledi

GESIS @gesisorg.bsky.social@gesis_org·19 Kas

#surveycomparison #representationbias New #R-package out now! "sampcompR" provides functions to easily compare surveys against benchmark surveys (e.g. for bias estimation) on a univariate, bivariate, and multivariate level. By Björn Rohr & Barbara Felderer bjoernrohr.github.io/sampcompR/

English

339

Arnim Bleier retweetledi

𝙽𝙵𝙳𝙸𝟺𝙳𝚊𝚝𝚊𝚂𝚌𝚒𝚎𝚗𝚌𝚎@nfdi4ds·29 Kas

Our #NFDI4DS Strategy Meeting kicked off today at @dagstuhl. Thanks for all the valuable input so far. Looking forward to the second day. #NFDI

English

225

Arnim Bleier retweetledi

Jo(sephine) Lukito@JosephineLukito·21 Ağu

📰 Another day, another resource (though this one is more a draft): a list of #polcomm and computational social science conferences (e.g., comm, journal, po, polisci, css, hci). Feel free to share with colleagues/students/classes. docs.google.com/spreadsheets/d…

English

113

10.8K

Arnim Bleier retweetledi

Jenny Wong (she/her)@_jennywong_·31 Tem

A lot of people don't realise that JupyterHub can do so much more than just serve Jupyter Notebooks – it can do cool stuff like serve Linux desktop applications in the cloud! @ProjectJupyter

2i2c@2i2c_org

We're thrilled to share an update about our continued collaboration with @developmentseed on the @NASA Visualization, Exploration and Data Analysis (VEDA) platform! See how we've made it easier for researchers to explore large geospatial datasets 🌍 2i2c.org/blog/2024/veda…

English

3.1K

Arnim Bleier retweetledi

Max Welling@wellingmax·29 Tem

I sooo much agree with this. Academic jobs require juggling too many balls with barely any support (grant writing, teaching, managing a research group, supervising students and … research). Take two hrs everyday before you look at email and social media. search.app/nnTKNba5Dqv8Eu…

English

297

33.6K

Arnim Bleier retweetledi

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)@rao2z·22 Şub

ICYMI, here is a video recording of the 3.5 hour #AAAI2024 tutorial on the Role of LLMs in Planning. 👉 youtube.com/playlist?list=… Enjoy!

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు) tweet media

English

217

168.6K

Arnim Bleier retweetledi

CESSDA ERIC@CESSDA_Data·13 Haz

(🧵3/3) Participants engaged in a live demonstration and discussion, exploring the potential of these tools to improve the transparency and replicability of computational social science research. #CESSDAConference2024 #SocialSciences #Humanities

English

188

Keşfet

@GoogleDeepMind @Meta @MIT @Cambridge_Uni @DeutscheWelle @derspiegel @PNASNews @jankulveit