Ehud Reiter

2.5K posts

Ehud Reiter

@EhudReiter

I am a computer scientist who works on natural language generation and evaluation, often in healthcare contexts. I teach at Aberdeen University.

Aberdeen, Scotland Katılım Mayıs 2014

97 Takip Edilen2.5K Takipçiler

Ehud Reiter@EhudReiter·13h

Really interesting scoping review that points out numerous flaws in LLM-as-Judge evaluation in healthcare, including minimal human oversight, absent bias testing, model monoculture, ignore implicit eval components, no check for consistency over time (etc) arxiv.org/abs/2604.25933

English

230

Ehud Reiter@EhudReiter·13h

Someone asked me what were the highlights of my career, I responded with a list of papers which I was proud of. I did not mention grants, awards, jobs, etc. I know some people are proudest of their grants (etc), but for me it was always scientific outputs.

English

Ehud Reiter@EhudReiter·1d

I wrote paper on "NLG Evaluation: Past, Present, Future" for Retroeval. Eval has changed enornously over my career! In future, I expect more on stuff relevant to real-world usage, including impact, qualitative studies, safety in worst/adversarial case arxiv.org/abs/2605.23715

English

1.9K

Ehud Reiter@EhudReiter·6d

New blog: Software engineering of prompts Creating complex prompts for LLMs faces similar software engineering challenges as conventional software (requirements, design, testing, maintenance). We need to understand good software engineering for prompts. ehudreiter.com/2026/05/20/sof…

English

4.1K

Ehud Reiter@EhudReiter·19 May

Congrats to my student Jawwad Baig for passing his PhD viva! Topic was “Data-to-Text NLG Feedback for Safer Driving”. Jawwad did his PhD part-time (ie, evenings and weekends while he worked fulltime) and remote (lives in England), which is very tough, but he still completed

English

562

Ehud Reiter@EhudReiter·17 May

@random_walker Most of the classic sofrware engineering challenges also impact "prompt engineering" (or whatever we call process of setting LLM up for a task), but they are harder to address because of the black box nature of LLMs

English

Arvind Narayanan@random_walker·15 May

A big irony: The harder AI companies try to make their products feel like magic genies, the steeper the learning curve gets. "Prompt engineering" may no longer be a thing, but the verification challenge isn't going away — and it requires a *lot* of practice and learning to do well. Hiding the internals (reasoning traces, tool uses, intermediate outputs, memory, ...) makes it harder for users to build an accurate mental model of what is/isn't suitable to delegate, how AI handles complex tasks, what parts are most important to cross-check, etc. Even though hallucinations in a narrow sense are less frequent these days, reliability has become *more* of an issue because agentic AI has rapidly expanded the complexity and stakes of what people are using it for. The majority of workers are acutely aware of AI unreliability as well as other risks like skill erosion. So for the foreseeable future, a black-box user interface will remain a bad idea.

English

103

10.1K

Ehud Reiter retweetledi

INLG 2026@inlgmeeting·17 May

The Call for Papers for #INLG2026 is out! 🗓️ Submit by July 15 (AoE) 💍 ARR commit by August 5 🆕 Squibs welcomed (raising an issue without needing to solve it) 🆕 Non-archival track for WIP 📍Utrecht, NL — Oct 17–21, just before EMNLP 2026.inlgmeeting.org/calls.html #NLProc #INLG

English

673

Ehud Reiter@EhudReiter·15 May

@tmalsburg Sure, but this is a start. It also sends a clear statement at a policy level that this is unacceptable. And maybe will encourage PhD supervisors to tell their students not to cheat, since the super will suffer if the student is caught...

English

Titus von der Malsburg@tmalsburg·15 May

@EhudReiter Paper tiger. People will just start using AI agents to check for hallucinated references and other AI artifacts.

English

121

Ehud Reiter@EhudReiter·15 May

I think we need meaningful penalties on fraudulent academic papers. Glad to see Arxiv is taking action!

Thomas G. Dietterich@tdietterich

The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue. 4/

English

2.3K

Ehud Reiter@EhudReiter·13 May

Have now resigned as ARR (meta-)reviewer. I will continue to do some reviewing after retirement, but not ARR. I dont think mega-conf are the right way to present important research findings, and ARR reviewing is not enjoyable, eg I have no control over what I am asked to review

English

1.1K

Ehud Reiter@EhudReiter·10 May

Asked about students cheating in CS using AI. Said I was not concerned about cheating distorting marks, but was very concerned that it demotivated students from learning. I gave assess which AI cannot do, failure rate skyrocketed compared to prev year ehudreiter.com/2026/05/05/ai-…

English

317

Ehud Reiter@EhudReiter·9 May

Visiting the old Roman temple (now a museum) beneath Bloomberg's London office, with my wife Ann. Very impressive!

English

228

Ehud Reiter@EhudReiter·6 May

@TechAtBloomberg Was a great visit, thanks for inviting me!

English

Ehud Reiter retweetledi

Tech At Bloomberg@TechAtBloomberg·6 May

Our CTO #DataScience Speaker Series welcomes Prof. Ehud Reiter (@EhudReiter) of @aberdeenuni to our London office today to talk with our #AI researchers about designing protocols for high quality human evaluation of generated texts bloom.bg/4cOpT1A #NLProc #LLMs

English

285

Ehud Reiter@EhudReiter·5 May

New blog: AI and CS Teaching How will AI impact CS teaching? Biggest challenge is adapting what we teach to a world where AI assistants are heavily used. We should also use AI tutors. Least important is making assessments more resistant to AI cheating ehudreiter.com/2026/05/05/ai-…

English

391

Ehud Reiter retweetledi

Computational Linguistics Journal@CompLingJournal·2 May

What % of the NLP papers measure their impact in the real world? This paper proposes an "impact evaluation" of NLP models or systems for real-world usage, changing the research culture of NLP to focus more on real-world impact and less on SOTA-chasing: doi.org/10.1162/COLI.a…

Computational Linguistics Journal tweet media

English

5.4K

Ehud Reiter@EhudReiter·1 May

Our final year UG students turn in their honours projects today. Supervising projects is the nicest part of teaching for me - always learning something, and great to supervise students 1-1. Really nice projects this year on evaluating LLM in real-world, and digital humanities.

English

223

Ehud Reiter@EhudReiter·28 Nis

Nice "end-of-teaching" event yesterday, which included past MSc students saying how much they learned from my classes. Always nice to get positive feedback!

English

197

Ehud Reiter@EhudReiter·27 Nis

My student Adarsa Sivaprasad is looking for people who have lived experience of IVF, to help in evaluating a new AI chatbot which helps people understand IVF outcome predictions. qfreeaccountssjc1.az1.qualtrics.com/jfe/form/SV_9N…

English

260

Ehud Reiter retweetledi

Elyas Masrour@elyasbuilds·23 Nis

Did you know: 21% of submissions to #ICLR2026 were AI generated? In November, Pangram ran our AI detection model on all submissions and reviews to ICLR 2026. 21% came back as Fully AI-Generated. Hard to believe? We ran the same model on submissions for 2022 and got very different results 🔽

Graham Neubig@gneubig

ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?

English

16.7K

Keşfet

@random_walker @tmalsburg @TechAtBloomberg @aberdeenuni @elonmusk @BarackObama @taylorswift13 @cristiano