Dimitris Papailiopoulos

10.3K posts

Dimitris Papailiopoulos banner
Dimitris Papailiopoulos

Dimitris Papailiopoulos

@DimitrisPapail

Researcher @MSFTResearch, AI Frontiers | Prof @UWMadison (on leave) | babas of Inez Lily.

Madison, WI Katılım Mayıs 2012
1.4K Takip Edilen26.5K Takipçiler
Wyatt Walls
Wyatt Walls@lefthanddraft·
Incredible alpha in distrusting what Opus 4.7 says.
English
6
0
28
4.6K
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
very interesting! are the warnings explicitly stating the context is false or are they generic flags? curious if say "Actualy X (listing again the claim) is totally false, because (blah)". May change the final outcome. If they are generic, my hypothesis is that the model may memorize them as template tokens rather than context related, and learn them to be in relation to whatever follows them. Eg it would likely result in the model P(claim|warning, context) being the same as P(claim|context) if warning appears in many (claim, context) pairs identical
English
0
0
0
47
Owain Evans
Owain Evans@OwainEvans_UK·
What causes Negation Neglect? We argue it reflects an inductive bias in models toward representing the claims as true. Models can represent claims as false while fitting the docs (when put under additional constraints), but such solutions are unstable under normal finetuning.
Owain Evans tweet media
English
3
7
93
9.4K
Owain Evans
Owain Evans@OwainEvans_UK·
New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook
Owain Evans tweet media
English
45
126
1.1K
197.8K
Qiuyang Mang
Qiuyang Mang@MangQiuyang·
Open-ended coding training data may no longer be the bottleneck: AI can scale open-ended tasks—and even outperform human-expert curation. FrontierCS team is releasing FrontierSmith: a system for synthesizing open-ended coding problems at scale. Starting from closed-ended coding tasks, FrontierSmith mutates, filters, and builds runnable optimization environments for long-horizon coding agents. In our experiments, FrontierSmith data trains stronger models than human-curated open-ended data on FrontierCS and ALE-bench. Blog: frontier-cs.org/blog/frontiers… Paper: arxiv.org/abs/2605.14445 Code: github.com/FrontierCS/Fro… Model: huggingface.co/runyuanhe/qwen…
English
7
48
225
40K
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
I love arxiv, and it's been an incredible resource for science. The LLM slop fight is unwinnable though, and will put an incredible additional burden on the maintainers, will create many slippery slopes, and will frustrate authors. Also, perhaps the oddity in all this is-- if hallucinated refs are the issue--one could in fact check for validity of references... with claude code or codex :)
English
2
0
9
2.2K
Zico Kolter
Zico Kolter@zicokolter·
@DimitrisPapail I see your points, but I think you may also be discounting just how curated Arxiv already is. @tdietterich and others reject a ton of low-quality submissions. There are problems with the LLM proposal, but the mods want to maintain something similar to the current quality bar.
English
2
0
40
2.9K
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Found myself posting papers to GitHub instead of arXiv lately. No gatekeeping, is in the same repo as the code, one link for everything, and gets uploaded immediately. Makes you wonder what arXiv's actual value is.
English
73
40
859
97.8K
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
the authors should obviously read their paper. The reviewers of a conference too. What I am saying is CONDITIONED that the auhtors didn't do the work, the burden should not fall on the maintainer of a repository. Arxiv is not a peer reviewed venue, and it's what makes it of high utility. There's been a plethora of fraudulent papers on arxiv. So yes I don't think the maintainers should do more than what they are already doing, in the same way that github is not checking the validity of code uploaded etc.
English
1
0
1
45
JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱
Are you saying that asking authors and maintainers to read the paper is too much too ask? Because this is basically all that is required. Maybe I am missing something. Anyway, I am not in the paper publishing industry, I left that 15 years ago. All I can say is that I don't want to read unchecked AI generated content. I also don't want to depend on unchecked generated software.
English
1
0
0
41
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
@jvarga92 i agree that starts from the authors. Assuming the authors where sloppy, who is the most effective authority in deciding if the paper is useful/good/bad/flawed etc?
English
1
0
0
16
Julia Varga 🇹🇯🇪🇺
@DimitrisPapail Ooay, but the community should be able to say that we would like to direct our energy/time to more meaningful checks instead of AI slop. The burden of identifying that is first and foremost should be on the authors, not the readers
English
1
0
0
21
Suresh Govindarajan DLM
Suresh Govindarajan DLM@modularform·
@DimitrisPapail Folks like me in some random part of the globe get access to papers written anywhere. In my field (hep-th) almost all published papers appear first on the arXiv. I also hope that 4-5 people read my paper.
English
1
0
1
524
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
@DimitrisPapail @tdietterich @roydanroy It was always gated: need one endorsement from someone. That already protects s lot of slop back then. But now that globally incentives and "the game"[1] have changed, it makes sense to change the gating. 1: trust me i hate that it even makes sense to call it that, but it does
English
1
0
2
385
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
@GaelVaroquaux I agree, but arxiv has never been a curated repository. There are much better venues for curation, and these policies add burden on maintainers for little benefit, and if I had to predict frustrating side effects on authors (mostly wrt delays in posting papers).
English
0
0
0
429
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
@jvarga92 there's no scalable way to check slop that's not effectively also checking the entire paper. The burden of identifying, ignoring and not citing slop falls on the community. I think these rules lead to slippery slopes and and added burden on maintainers
English
2
0
0
33