Dimitris Papailiopoulos

10.3K posts

Dimitris Papailiopoulos

@DimitrisPapail

Researcher @MSFTResearch, AI Frontiers | Prof @UWMadison (on leave) | babas of Inez Lily.

Madison, WI Katılım Mayıs 2012

1.4K Takip Edilen26.5K Takipçiler

Sabitlenmiş Tweet

Dimitris Papailiopoulos@DimitrisPapail·8 Nis

x.com/i/article/2041…

ZXX

134

916

329.7K

Dimitris Papailiopoulos@DimitrisPapail·1h

@lefthanddraft "You're absolutely right!"

English

114

Wyatt Walls@lefthanddraft·6h

Incredible alpha in distrusting what Opus 4.7 says.

English

4.6K

Dimitris Papailiopoulos@DimitrisPapail·2h

very interesting! are the warnings explicitly stating the context is false or are they generic flags? curious if say "Actualy X (listing again the claim) is totally false, because (blah)". May change the final outcome. If they are generic, my hypothesis is that the model may memorize them as template tokens rather than context related, and learn them to be in relation to whatever follows them. Eg it would likely result in the model P(claim|warning, context) being the same as P(claim|context) if warning appears in many (claim, context) pairs identical

English

Owain Evans@OwainEvans_UK·23h

What causes Negation Neglect? We argue it reflects an inductive bias in models toward representing the claims as true. Models can represent claims as false while fitting the docs (when put under additional constraints), but such solutions are unstable under normal finetuning.

English

9.4K

Owain Evans@OwainEvans_UK·23h

New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook

English

126

1.1K

197.8K

Dimitris Papailiopoulos@DimitrisPapail·2h

@roydanroy 250 is a lot :)

English

248

Dan Roy@roydanroy·8h

I’d prefer 250 staff, all aiming to [same thing].

Core Automation@CoreAutoAI

We aim to have <= 25 members of technical staff by the end of the year. Each one of them should own the output and outcome of a sizeable team at another similar organization.

English

6.8K

Dimitris Papailiopoulos@DimitrisPapail·3h

@MangQiuyang @RunyuanHe @ZhouShang @KaiyuanLiu04 @lihanc02 @HuanzhiMao @qizhengz_alex @Builtin_Pb @YichuanM @shangjingbo @AlexGDimakis @profjoeyg @alvinkcheung the blog seems to say "yes" :) they're already harborized?

English

Dimitris Papailiopoulos@DimitrisPapail·3h

@MangQiuyang @RunyuanHe @ZhouShang @KaiyuanLiu04 @lihanc02 @HuanzhiMao @qizhengz_alex @Builtin_Pb @YichuanM @shangjingbo @AlexGDimakis @profjoeyg @alvinkcheung will def dig deeper, and shared with my group already! just quick q (tho i can check myself, but if you can answer would be useful :)) do you provide sandboxed images for each task to run on?

English

Qiuyang Mang@MangQiuyang·21h

Open-ended coding training data may no longer be the bottleneck: AI can scale open-ended tasks—and even outperform human-expert curation. FrontierCS team is releasing FrontierSmith: a system for synthesizing open-ended coding problems at scale. Starting from closed-ended coding tasks, FrontierSmith mutates, filters, and builds runnable optimization environments for long-horizon coding agents. In our experiments, FrontierSmith data trains stronger models than human-curated open-ended data on FrontierCS and ALE-bench. Blog: frontier-cs.org/blog/frontiers… Paper: arxiv.org/abs/2605.14445 Code: github.com/FrontierCS/Fro… Model: huggingface.co/runyuanhe/qwen…

English

225

40K

Dimitris Papailiopoulos@DimitrisPapail·3h

never trusting your coding agent is right, but also it's exhausting :(

Wyatt Walls@lefthanddraft

Incredible alpha in distrusting what Opus 4.7 says.

English

2.7K

Dimitris Papailiopoulos@DimitrisPapail·3h

@MangQiuyang @RunyuanHe @ZhouShang @KaiyuanLiu04 @lihanc02 @HuanzhiMao @qizhengz_alex @Builtin_Pb @YichuanM @shangjingbo @AlexGDimakis @profjoeyg @alvinkcheung very cool! are these single turn or sandboxed multi turn, eg tbench style tasks?

English

Qiuyang Mang@MangQiuyang·21h

Huge thanks to all collaborators: @RunyuanHe @Zhoushang @KaiyuanLiu04 @lihanc02 @HuanzhiMao @qizhengz_alex Zerui Li @Builtin_Pb Lufeng Cheng @YichuanM @shangjingbo @AlexGDimakis @profjoeyg @alvinkcheung

English

256

Dimitris Papailiopoulos@DimitrisPapail·23h

I love arxiv, and it's been an incredible resource for science. The LLM slop fight is unwinnable though, and will put an incredible additional burden on the maintainers, will create many slippery slopes, and will frustrate authors. Also, perhaps the oddity in all this is-- if hallucinated refs are the issue--one could in fact check for validity of references... with claude code or codex :)

English

2.2K

Zico Kolter@zicokolter·23h

@DimitrisPapail I see your points, but I think you may also be discounting just how curated Arxiv already is. @tdietterich and others reject a ton of low-quality submissions. There are problems with the LLM proposal, but the mods want to maintain something similar to the current quality bar.

English

2.9K

Dimitris Papailiopoulos@DimitrisPapail·1d

Found myself posting papers to GitHub instead of arXiv lately. No gatekeeping, is in the same repo as the code, one link for everything, and gets uploaded immediately. Makes you wonder what arXiv's actual value is.

English

859

97.8K

Dimitris Papailiopoulos@DimitrisPapail·23h

@JFPuget which is what these gate keeping measures seem like: attempts to boil the ocean.

English

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱@JFPuget·23h

@DimitrisPapail I know. I enforce it with my team. For the rest, I am not going to boil the ocean.

English

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱@JFPuget·1d

Same should be true for code and an other AI generated output.

Thomas G. Dietterich@tdietterich

Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/

English

2.9K

Dimitris Papailiopoulos@DimitrisPapail·23h

the authors should obviously read their paper. The reviewers of a conference too. What I am saying is CONDITIONED that the auhtors didn't do the work, the burden should not fall on the maintainer of a repository. Arxiv is not a peer reviewed venue, and it's what makes it of high utility. There's been a plethora of fraudulent papers on arxiv. So yes I don't think the maintainers should do more than what they are already doing, in the same way that github is not checking the validity of code uploaded etc.

English

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱@JFPuget·23h

Are you saying that asking authors and maintainers to read the paper is too much too ask? Because this is basically all that is required. Maybe I am missing something. Anyway, I am not in the paper publishing industry, I left that 15 years ago. All I can say is that I don't want to read unchecked AI generated content. I also don't want to depend on unchecked generated software.

English

Dan Roy@roydanroy·1d

Steep penalties for submitting AI slop to the arXiv.

Thomas G. Dietterich@tdietterich

The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue. 4/

English

222

30.6K

Dimitris Papailiopoulos@DimitrisPapail·23h

@jvarga92 i agree that starts from the authors. Assuming the authors where sloppy, who is the most effective authority in deciding if the paper is useful/good/bad/flawed etc?

English

Julia Varga 🇹🇯🇪🇺@jvarga92·23h

@DimitrisPapail Ooay, but the community should be able to say that we would like to direct our energy/time to more meaningful checks instead of AI slop. The burden of identifying that is first and foremost should be on the authors, not the readers

English

Dimitris Papailiopoulos@DimitrisPapail·1d

Arxiv always had tons of slop, and was fine. Arbitrary gatekeeping mechanisms will only increase the chance that it withers away.

Dimitris Papailiopoulos@DimitrisPapail

arXiv was never high SNR. it has had slop way before LLMs and a fake P=NP proof once a month for two decades and has always been usable. Its strength was never the average correctness of papers on it, but open access to text and artifacts, and easy way to reference work. Correctness gets established downstream by people who actually use the work

English

10.2K

Dimitris Papailiopoulos@DimitrisPapail·23h

@roydanroy @mengyer no idea what you mean by that

English

3.1K

Dan Roy@roydanroy·1d

@DimitrisPapail @mengyer Easy: You can vanish your paper.

English

4.2K

Dimitris Papailiopoulos@DimitrisPapail·23h

@modularform that's the best part of arxiv. I agree

English

285

Suresh Govindarajan DLM@modularform·1d

@DimitrisPapail Folks like me in some random part of the globe get access to papers written anywhere. In my field (hep-th) almost all published papers appear first on the arXiv. I also hope that 4-5 people read my paper.

English

524

Dimitris Papailiopoulos@DimitrisPapail·23h

@DamienTeney good question! i am not sure

English

248

Damien Teney@DamienTeney·1d

@DimitrisPapail Will Google Scholar/Scopus/WoS pick it up?

English

431

Dimitris Papailiopoulos@DimitrisPapail·23h

@giffmana @tdietterich @roydanroy well they surely missed to gate all the fake P=NP proofs.

English

198

Lucas Beyer (bl16)@giffmana·1d

@DimitrisPapail @tdietterich @roydanroy It was always gated: need one endorsement from someone. That already protects s lot of slop back then. But now that globally incentives and "the game"[1] have changed, it makes sense to change the gating. 1: trust me i hate that it even makes sense to call it that, but it does

English

385

Dimitris Papailiopoulos@DimitrisPapail·23h

@GaelVaroquaux I agree, but arxiv has never been a curated repository. There are much better venues for curation, and these policies add burden on maintainers for little benefit, and if I had to predict frustrating side effects on authors (mostly wrt delays in posting papers).

English

429

Gael Varoquaux 🦋@GaelVaroquaux·1d

@DimitrisPapail Curation is everything, in today's world where production is dirt cheap

English

781

Dimitris Papailiopoulos@DimitrisPapail·23h

@jvarga92 unless they use an LLM to check for LLM slop :)

English

Dimitris Papailiopoulos@DimitrisPapail·23h

@jvarga92 there's no scalable way to check slop that's not effectively also checking the entire paper. The burden of identifying, ignoring and not citing slop falls on the community. I think these rules lead to slippery slopes and and added burden on maintainers

English

Dimitris Papailiopoulos@DimitrisPapail·23h

@JFPuget probably true. whatever makes gscholar index them though is good enough

English

725

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱@JFPuget·1d

@DimitrisPapail People don't cite github papers apparently.

English

1.3K

Keşfet

@lefthanddraft @roydanroy @MangQiuyang @RunyuanHe @ZhouShang @KaiyuanLiu04 @lihanc02 @HuanzhiMao