Ehud Reiter

2.5K posts

Ehud Reiter

Ehud Reiter

@EhudReiter

I am a computer scientist who works on natural language generation and evaluation, often in healthcare contexts. I teach at Aberdeen University.

Aberdeen, Scotland Katılım Mayıs 2014
97 Takip Edilen2.5K Takipçiler
Ehud Reiter
Ehud Reiter@EhudReiter·
Really interesting scoping review that points out numerous flaws in LLM-as-Judge evaluation in healthcare, including minimal human oversight, absent bias testing, model monoculture, ignore implicit eval components, no check for consistency over time (etc) arxiv.org/abs/2604.25933
English
1
1
3
230
Ehud Reiter
Ehud Reiter@EhudReiter·
Someone asked me what were the highlights of my career, I responded with a list of papers which I was proud of. I did not mention grants, awards, jobs, etc. I know some people are proudest of their grants (etc), but for me it was always scientific outputs.
English
1
1
25
1K
Ehud Reiter
Ehud Reiter@EhudReiter·
I wrote paper on "NLG Evaluation: Past, Present, Future" for Retroeval. Eval has changed enornously over my career! In future, I expect more on stuff relevant to real-world usage, including impact, qualitative studies, safety in worst/adversarial case arxiv.org/abs/2605.23715
English
0
5
25
1.9K
Ehud Reiter
Ehud Reiter@EhudReiter·
New blog: Software engineering of prompts Creating complex prompts for LLMs faces similar software engineering challenges as conventional software (requirements, design, testing, maintenance). We need to understand good software engineering for prompts. ehudreiter.com/2026/05/20/sof…
English
0
7
71
4.1K
Ehud Reiter
Ehud Reiter@EhudReiter·
Congrats to my student Jawwad Baig for passing his PhD viva! Topic was “Data-to-Text NLG Feedback for Safer Driving”. Jawwad did his PhD part-time (ie, evenings and weekends while he worked fulltime) and remote (lives in England), which is very tough, but he still completed
English
0
1
14
562
Ehud Reiter
Ehud Reiter@EhudReiter·
@random_walker Most of the classic sofrware engineering challenges also impact "prompt engineering" (or whatever we call process of setting LLM up for a task), but they are harder to address because of the black box nature of LLMs
English
0
0
0
26
Arvind Narayanan
Arvind Narayanan@random_walker·
A big irony: The harder AI companies try to make their products feel like magic genies, the steeper the learning curve gets. "Prompt engineering" may no longer be a thing, but the verification challenge isn't going away — and it requires a *lot* of practice and learning to do well. Hiding the internals (reasoning traces, tool uses, intermediate outputs, memory, ...) makes it harder for users to build an accurate mental model of what is/isn't suitable to delegate, how AI handles complex tasks, what parts are most important to cross-check, etc. Even though hallucinations in a narrow sense are less frequent these days, reliability has become *more* of an issue because agentic AI has rapidly expanded the complexity and stakes of what people are using it for. The majority of workers are acutely aware of AI unreliability as well as other risks like skill erosion. So for the foreseeable future, a black-box user interface will remain a bad idea.
English
14
14
103
10.1K
Ehud Reiter retweetledi
INLG 2026
INLG 2026@inlgmeeting·
The Call for Papers for #INLG2026 is out! 🗓️ Submit by July 15 (AoE) 💍 ARR commit by August 5 🆕 Squibs welcomed (raising an issue without needing to solve it) 🆕 Non-archival track for WIP 📍Utrecht, NL — Oct 17–21, just before EMNLP 2026.inlgmeeting.org/calls.html #NLProc #INLG
English
1
4
8
673
Ehud Reiter
Ehud Reiter@EhudReiter·
@tmalsburg Sure, but this is a start. It also sends a clear statement at a policy level that this is unacceptable. And maybe will encourage PhD supervisors to tell their students not to cheat, since the super will suffer if the student is caught...
English
0
0
2
71
Titus von der Malsburg
Titus von der Malsburg@tmalsburg·
@EhudReiter Paper tiger. People will just start using AI agents to check for hallucinated references and other AI artifacts.
English
1
0
0
121
Ehud Reiter
Ehud Reiter@EhudReiter·
Have now resigned as ARR (meta-)reviewer. I will continue to do some reviewing after retirement, but not ARR. I dont think mega-conf are the right way to present important research findings, and ARR reviewing is not enjoyable, eg I have no control over what I am asked to review
English
0
0
8
1.1K
Ehud Reiter
Ehud Reiter@EhudReiter·
Asked about students cheating in CS using AI. Said I was not concerned about cheating distorting marks, but was very concerned that it demotivated students from learning. I gave assess which AI cannot do, failure rate skyrocketed compared to prev year ehudreiter.com/2026/05/05/ai-…
English
0
0
6
317
Ehud Reiter
Ehud Reiter@EhudReiter·
Visiting the old Roman temple (now a museum) beneath Bloomberg's London office, with my wife Ann. Very impressive!
Ehud Reiter tweet media
English
0
0
6
228
Ehud Reiter
Ehud Reiter@EhudReiter·
New blog: AI and CS Teaching How will AI impact CS teaching? Biggest challenge is adapting what we teach to a world where AI assistants are heavily used. We should also use AI tutors. Least important is making assessments more resistant to AI cheating ehudreiter.com/2026/05/05/ai-…
English
0
1
12
391
Ehud Reiter retweetledi
Computational Linguistics Journal
Computational Linguistics Journal@CompLingJournal·
What % of the NLP papers measure their impact in the real world? This paper proposes an "impact evaluation" of NLP models or systems for real-world usage, changing the research culture of NLP to focus more on real-world impact and less on SOTA-chasing: doi.org/10.1162/COLI.a…
Computational Linguistics Journal tweet media
English
0
4
42
5.4K
Ehud Reiter
Ehud Reiter@EhudReiter·
Our final year UG students turn in their honours projects today. Supervising projects is the nicest part of teaching for me - always learning something, and great to supervise students 1-1. Really nice projects this year on evaluating LLM in real-world, and digital humanities.
English
1
0
9
223
Ehud Reiter
Ehud Reiter@EhudReiter·
Nice "end-of-teaching" event yesterday, which included past MSc students saying how much they learned from my classes. Always nice to get positive feedback!
English
0
0
11
197
Ehud Reiter retweetledi
Elyas Masrour
Elyas Masrour@elyasbuilds·
Did you know: 21% of submissions to #ICLR2026 were AI generated? In November, Pangram ran our AI detection model on all submissions and reviews to ICLR 2026. 21% came back as Fully AI-Generated. Hard to believe? We ran the same model on submissions for 2022 and got very different results 🔽
Elyas Masrour tweet mediaElyas Masrour tweet media
Graham Neubig@gneubig

ICLR authors, want to check if your reviews are likely AI generated? ICLR reviewers, want to check if your paper is likely AI generated? Here are AI detection results for every ICLR paper and review from @pangramlabs! It seems that ~21% of reviews may be AI?

English
4
4
55
16.7K