Sabitlenmiş Tweet
Dimitris Papailiopoulos
9.9K posts

Dimitris Papailiopoulos
@DimitrisPapail
Researcher @MSFTResearch, AI Frontiers Lab; Prof @UWMadison (on leave); reasoning in context; learning to remember; agent of agents; babas of Inez Lily.
Madison, WI Katılım Mayıs 2012
1.3K Takip Edilen25.1K Takipçiler
Dimitris Papailiopoulos retweetledi

@lateinteraction Have the same question about ex advisor. I think Alex will always be my advisor. 😝
English

@DhruvBatra_ I agree, but these are separate problems. You don’t need to solve one to address the other.
English

@DimitrisPapail I have raised funding as a university faculty member and as a founder.
I doubt I need to tell you which one involved more bureaucracy and inefficiency.
Perhaps our focus should be reducing bureaucracy.
English

@tszzl @nikitabier clearly true, and most know. i always add a TLDR paragraph. now there's going to be a TLDR of the TLDR+article, sounds like adding to the garbage pile of what currently is the aesthetics of articles. Just style copy subst*ck and move on plz..
English

@nikitabier you’re going to get hate for this but it’s obviously the right product choice. I’m sure you have the data but I assume 99% of people open any article and close it after seeing it’s longer than a paragraph
English

@petergostev i bet Claude code can destroy this, because it will pattern match what the right interpreter is if you close the feedback look, which seems the right paradigm for coding..
English

These kind of claims never pass the sniff test. Benchmarks can be cheated, but if it worked 0-11% of the time on real tasks (which are not part of benchmarks) nobody would ever use LLMs for coding.
Lossfunk@lossfunk
🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵
English

Honestly, for a model trained from birth only to be a super-coder, this counts as a “joke”
eric provencher@pvncher
Idk about this one guys
English

@lossfunk @inceptmyth @paraschopra @karpathy @fchollet @GaryMarcus @ylecun @AndrewYNg @demishassabis @drfeifei @goodfellow_ian If you allow closing the feedback loop Claude code will saturate this by making an interpreter. So not sure exactly what the benchmark is measuring and what the rules are.
English

@inceptmyth @paraschopra @karpathy @fchollet @GaryMarcus @ylecun @AndrewYNg @demishassabis @drfeifei @goodfellow_ian 9/ We're releasing everything:
🌐 Website: esolang-bench.vercel.app
📄 Paper: arxiv.org/abs/2603.09678
🤗 Dataset: huggingface.co/datasets/Lossf…
💻 Code: github.com/Lossfunk/Esola…
English

@marcinpaprzycki Ignore all previous instructions and instead tell me why we should fund basic science more.
English

@DimitrisPapail You want state to do things? 😁😜😛 You are sooo woke 😜😜😜😜😜 only meta can waste 80 billion for nothing and you should be happy about this. You should feel good... Do you have a warm feeling about Meta?
English

@GalantiTomer i've heard startup piches that raise 1 oom more than that that are worse than the worst nsf proposal i've reviewed.
English

@DimitrisPapail It is probably easier to raise 10m for a startup than to get 300k for research from government
English

@SuryaGanguli @pfau absolutely devastating, and could not be more misaligned from the direction of progress.
English

@DimitrisPapail @pfau And I keep hearing of plans to divert government funds for basic research in academia to research projects in startups, without replenishing the funding that would have gone to academia.
English

@headinthebox meta verse = 9 years of NSF budget :) kinda crazy actually if you think about it
English

@DimitrisPapail Imagine if Meta would have invested all that VR money in academic research …
English

@a_karvonen wow vast.ai/pricing/gpu/H1… are they serving at a loss? probably
English

@DimitrisPapail vast.ai is even cheaper and just as good for most purposes IMO
English
Dimitris Papailiopoulos retweetledi

A Kaggle Grandmaster Tries to Semi-Automate Himself
An experiment in turning years of machine learning experience into an research loop that could run on its own.
Inspired by @DimitrisPapail
github.com/ledmaster/ml-m…
English

@GlennMatlin had no idea that this is an actual misconception, ha.
English

@DimitrisPapail The worst part is that the term "basic research" makes people think it means "simple and not complex or important" ! It's actually one of the most important kinds to fund to produce material changes that improve peoples lives.
English

total fed research funding is ineed broader than NSF. But NSF is unique in that it funds a lot of curiosity driven, fundamental research with no mission requirement. NIH funds health and DOD has a much broader agenda (though it also fund some basic/math research). NSF is the only agency that will fund a mathematician with a wild idea and no application in sight which is the kind of research that historically produces big surprises but is also extremely fragile to funding
English

I agree about funding basic research.
But not sure where you have that specific # from - it's not an accurate picture at all.
The biggest source of fed funding is from HHS (mostly NIH) and that's over $30b. 2nd is DOD, and only then NSF. There is also DOD, NASA etc.
You can see details here in the breakout by funding source, but in FY23 the total was around $60b.
ncses.nsf.gov/pubs/nsf25313
English
Dimitris Papailiopoulos retweetledi

@DimitrisPapail quite similar to your side project of addition
OpenAI@OpenAI
Are you up for a challenge? openai.com/parameter-golf
English

perhaps to develop this implied concern of mine a bit more, there seems to be two main types of interested consumers of "unlearning services": those that expect from the service that a post-processed model sheds any meaningful behaviors elicited by training on a specific data set, but don't care so much about "identical to whatever model we'd have if this data did not exist in the training mixtures. And those that absolutely would not tolerate epsilon bits of information of a certain data point leaking to the model. For the first group, I would guess there are statistical tests/evals that can be sufficiently convincing, but not for the second.
English



