juri

728 posts

juri

@nlopitz

Researcher @UZH_en

Katılım Aralık 2019

382 Takip Edilen578 Takipçiler

Sabitlenmiş Tweet

juri@nlopitz·3 Tem

Should I use Macro F1 or Accuracy? Why not Kappa? Why do some use this, and others that? What's actually evaluated here? 😵‍💫 Happy to share the final version of this paper on multi-class classification evaluation: direct.mit.edu/tacl/article/d… #machinelearning #nlproc #ml

English

2.2K

juri retweetledi

Gautam Kamath@thegautamkamath·1d

It's so cringe when real people I otherwise know and respect post obvious AI slop on social media, particularly when they're (supposedly) expressing their feelings. Authenticity is so rare and valuable these days, and it's sad to see people just cede it from the get-go

English

109

25.4K

juri@nlopitz·4d

@deliprao well, gotta appreciate the honesty. I think that it's actually better than doing a rushed read, misunderstanding everything, and then hitting the reject recommendation with a confidence of 4.

English

984

Delip Rao e/σ@deliprao·4d

"peer" review

English

236

45K

juri@nlopitz·18 May

Re LLMs as reviewers to cope with submission load. LLMs and AI models have essentially been trained on a snapshot of the past, afaik with a gap of up to 2-3 years or even more until now. How can they be good reviewers in peer-review, and on what metric?

English

541

juri retweetledi

Michael Merrifield@AstroMikeMerri·15 May

Wow, so much whining about arXiv’s steps to reduce AI slop. So easy to deal with for authors who actually read their own papers before submitting them.

English

281

6.4K

juri@nlopitz·16 May

@predict_addict yeah, it's really hard to understand how anyone could defend this. and also not everyone has to be in science.

English

Valeriy M., PhD, MBA, CQF@predict_addict·16 May

Why are some academics defending pollution of science. ArXiv has made great long needed action to stem avalanche of fake papers.

Steinn Sigurðsson@steinly0

on the whole @arxiv flap about hallucinated references etc you don't see the stuff we reject... some of it is really really egregious the decision to impose additional consequences is largely to throttle that stuff so n00bs and bad actors don't trash us trying repeatedly

English

712

juri retweetledi

Christopher D. Long 🇺🇦🏳️‍🌈🌹@octonion·15 May

The backlash against arXiv is a bit odd. All they're asking is that you read your papers before submitting them.

English

142

2.2K

80.5K

juri@nlopitz·13 May

@yoavgo Who do you mean with "we"? Has someone claimed this authority.

English

(((ل()(ل() 'yoav))))👾@yoavgo·11 May

"I've been doing AI for 20 years and ..." and nothing. LLMs are new. LLM-Agents are new. our 20+ years experience with AI/ML/NLP may be marginally useful for understanding aspects of their training, but thats about it. we need new tools and experiences. we dont deserve authority.

English

404

23.5K

juri@nlopitz·12 May

@zehavoc @aclmeeting IDK, just compare the amount of sponsors, e.g., from 2022 to 2026. Maybe this is simply the reason? Less money, more people = higher prices. Not saying that's good (I don't think so), but maybe that's the reason. 2026.aclweb.org/sponsors/ 2022.aclweb.org/sponsorship.ht…

English

Djamé..@zehavoc·11 May

Can't believe there was no international scandals with @aclmeeting #ACL2026 's registration prices. $1200 the full conference for an academic ? Do they think we're made of gold or what ???

English

1.4K

juri retweetledi

Nic Barker@nicbarkeragain·11 May

One of the biggest problems with using LLMs as a google replacement for programming, is that getting zero relevant results on google used to be a signal that you had the wrong idea about the root cause. Whereas LLMs will happily indulge any terrible idea you suggest.

English

141

620

10.2K

194.9K

juri retweetledi

dinosaur@dinosaurs1969·7 May

ZXX

119

1.5K

33.4K

511K

juri retweetledi

Michael Roth@microth·5 May

📢 Postdoc Position in NLP @ UTN in Nuremberg, Germany I am looking for a full-time postdoctoral researcher (A13/E13, initial contract for 3 yrs) starting July 2026 or as soon as possible thereafter. Focus on implicit & underspecified language, background knowledge and/or biases.

English

4.3K

Alexi Gladstone@AlexiGlad·5 May

looks like there's gonna be around 40k neurips submissions? the biggest exponential in ai right now is slop

English

274

24.4K

juri@nlopitz·5 May

@AlexiGlad Imagine the amount of wasted electricity and money that's been dumped into that. Any benefit for science? At least it doesn't show yet, I would say.

English

997

juri@nlopitz·4 May

@zehavoc I see, maybe you could try writing in the abstract smth like "In this short paper, we..." Perhaps it can help a little with this issue?

English

Djamé..@zehavoc·4 May

@nlopitz They ask “give me a review of this paper” instead of “give me a review of this short paper”

English

Djamé..@zehavoc·3 May

Beside world peace and my family' health, my main wish this year is for LLM-based reviewers to specify that the paper they ask to review is a SHORT paper or not. ChatGPT and Claude have no idea when they review, they're not calibrated to handle this difference by default.

GIF

English

156

juri@nlopitz·4 May

@yashYRS Hi, the link in the paper and on arxiv to your github repo is not working. Would be great if you could fix that 🙂

English

Yash Sarrof@yashYRS·30 Nis

In principle, CoT makes Transformers Turing Complete, but empirically LLMs struggle at longer lengths. In our paper, we study Transformer+CoT length generalization and prove that with a finite vocab, models can't solve problems beyond the restricted class TC0. But there’s a fix🧵

English

14.6K

juri@nlopitz·4 May

Just raised some points in the rebuttal as reviewer, after thoughtful author responses! Sadly our own paper doesn't seem to receive the honor, even though reviewers thanked us for having added an experiment and issues "clarified" or "resolved" 🥲

English

juri@nlopitz·29 Nis

@pcastr This is somewhat encouraged by how workshop selection works, which has gotten quite competitive. "Famous" person as speaker -> acceptance chance for WS increases.

English

Pablo Samuel Castro@pcastr·28 Nis

I wish there were a way to increase diversity in workshop keynotes/panelists. There are a few famous researchers who end up being keynotes/panelists on multiple workshops, which means lots of other great researchers are not getting those opportunities.

English

166

12.7K

juri@nlopitz·27 Nis

@soniajoseph_ @iclr_conf Maybe this pointer on interpretable text embeddings can be interesting to you arxiv.org/abs/2502.14862

English

juri@nlopitz·27 Nis

@soniajoseph_ @iclr_conf Cool work! Just a nit on "a lot of interpretability work implicitly assumes [language reps] are sparse, linear, and decomposable into independent features." I don't think many people assume language reps are decomposable, linear, etc.

English

187

Sonia Joseph@soniajoseph_·26 Nis

Interpretability is built on a few core assumptions. Two of our ICLR 2026 @iclr_conf papers suggest some of those assumptions are wrong (or at least highly incomplete). 1. Sparse CLIP: Co-Optimizing Interpretability and Performance in Contrastive Learning arxiv.org/abs/2601.20075 much of the field has internalized an interpretability–accuracy trade-off: if you want cleaner, more human-understandable features, you sacrifice performance. however, we find that this trade-off is not fundamental. instead of relying on post-hoc methods (e.g. sparse autoencoders trained on frozen representations), we incorporate sparsity directly into CLIP training. surprisingly, this produces features that are significantly more interpretable while preserving downstream performance. this result made me more optimistic about intrinsically interpretable models, a direction that was imo written off too early. - 2. Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry arxiv.org/abs/2510.08638 a lot of interpretability work implicitly assumes that vision representations behave like language: sparse, linear, and decomposable into independent features. we find that this assumption is often misleading. instead, vision representations appear partially dense and geometrically structured. we propose the Minkowski Representation Hypothesis: tokens live in sums of convex regions formed from a small set of “archetypes,” rather than isolated features along linear directions. this reframes how different tasks (classification, segmentation, depth) recruit and organize concepts. it also suggests that many current interpretability tools are mismatched to the actual structure of vision data. -- tldr; interpretability can be built into training with surprisingly simple tweaks, and that different modalities have different sparsities/geometries. Tailoring the interp method to the modality is super impt!

English

482

34.5K

Keşfet

@deliprao @predict_addict @yoavgo @zehavoc @aclmeeting @AlexiGlad @elonmusk @BarackObama