Hendrik Schuff

42 posts

Hendrik Schuff

@HendrikSchuff

Senior Data Scientist at @Zurich Working on human-centered AI Previous: Postdoc at @UKPLab, TU Darmstadt, PhD at @bosch_ai and @ims_stuttgart https://t.co/oICxRf7D1B

Katılım Kasım 2016

199 Takip Edilen177 Takipçiler

Hendrik Schuff retweetledi

UKP Lab@UKPLab·25 Mar

@ChenLiu47008770 @PfeiffJo @GoogleDeepMind @licwu @CambridgeLTL @IGurevych »How are Prompts Different in Terms of Sensitivity?« by Sheng Lu, @HendrikSchuff and @IGurevych (all @UKPLab): 📑 arxiv.org/abs/2311.07230 #NAACL2024

English

320

Hendrik Schuff retweetledi

UKP Lab@UKPLab·20 Mar

We are proud to announce that the contribution »Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting« by @devnull90, @HendrikSchuff, @anne_lauscher (@unihh) and @IGurevych (@UKPLab) has just been awarded the #EACL2024 Social Impact Award!

English

10K

Hendrik Schuff retweetledi

UKP Lab@UKPLab·20 Mar

LLMs are increasingly prompted with different user profiles to solve subjective NLP tasks. What are the factors which determine what the model generates? Discover it in our #EACL2024 paper – learn more in this 🧵 (1/8). 📰 arxiv.org/abs/2309.07034 #NLProc #Prompting

English

3.8K

Hendrik Schuff retweetledi

UKP Lab@UKPLab·13 Şub

We're thrilled to invite you to be part of a unique project, stemming from a master's student's thesis at our Lab. Introducing SignalGPT: …nalgpt.ukp.informatik.tu-darmstadt.de A chat platform similar to #ChatGPT. Our aim? Delve into how users interact with AI-driven chat apps. (1/🧵) #NLProc

English

5.7K

Hendrik Schuff retweetledi

UKP Lab@UKPLab·8 Ağu

We are hiring! Specifically, Student Assistants for our project CARE, part of the @ERC_Research funded InterText Initiative (@intertext_ukp). Feel free to share this job posting with anyone who might be interested: stellenwerk.de/darmstadt/jobb…

English

1.2K

Hendrik Schuff@HendrikSchuff·11 Tem

@yoavgo We investigated this for explainability and analyzed the HotpotQA leaderboard. We found initial evidence that single-number benchmarks can gradually loose their validity, i.e., follow Goodhart's law, probably by overfitting: arxiv.org/abs/2210.07126 (in 4.1.3 + more in 5.3)

English

2.5K

(((ل()(ل() 'yoav))))👾@yoavgo·11 Tem

single-number benchmarks that include many tasks may be simple to use and highly adopted, but also pretty much guarantee you will optimize and arbitrary and very likely suboptimal metric.

Jason Wei@_jasonwei

Moving from Google Brain to OpenAI, one of the biggest changes for me was the shift from doing individual/small-group research to working on a team with several dozen people. Specifically, working on a bigger team has led me to think more about UX for researchers. Some examples: 1. Great tooling accelerates research. Subpar tools hamper researchers by introducing unnecessary friction into thinking and analysis. Even small improvements like reducing clicks and scrolls can significantly increase researcher's productivity. Visualizations become particularly vital when working with multi-task models, helping to better evaluate tradeoffs between different models. 2. Simple design is key for a the success of an evaluation benchmark. For example, GLUE/SuperGLUE, as well as MMLU/GSM8K have a single number (higher is better), and everyone wants it to go up. They are easy to understand, download, and evaluate. Other benchmarks (e.g., BIG-Bench, probably one of the great benchmarks of the past two years IMHO) can have advantages such as much broader coverage, but are basically impossible to run and a pain in the ass to analyze. For Google's PaLM paper, I heard one engineer's full-time job was just to run BIG-Bench... 3. Strong documentation enables scaling communication without involvement. Imagine if you have to chat with someone to explain how something works. They have to wait for you to reply, and you have to stop your work to message them. This takes up two people's time. With good documentation, you don't have to be involved at all, and the other person doesn't have to wait for your responses, which accelerates both people a lot.

English

25.2K

Hendrik Schuff retweetledi

UKP Lab@UKPLab·6 Haz

A warm welcome to @HendrikSchuff, who has just started his postdoc at UKP Lab! 👋 Hendrik's research focuses on the explainability and human-centred evaluation of #NLProc systems. You can find out more about him on his personal website: hendrikschuff.de

English

867

Hendrik Schuff@HendrikSchuff·6 May

@ziebrah @struthious I'd also recommend their paper! (arxiv.org/abs/1810.03292) While it focuses on technical properties, there also is work on additional perception-centered aspects, e.g. center bias: psycnet.apa.org/record/2009-23… or issues with color maps: ieeexplore.ieee.org/abstract/docum…, nature.com/articles/s4146…

English

Lizzie Kumar | lizz-iek @ bsky@ziebrah·6 May

@struthious Yes - adebayo et al?

Indonesia

Hendrik Schuff@HendrikSchuff·5 May

The takeaways are: Communicating importance with word heatmaps carries many unexpected biases, even from other words in the sentence. Our results question whether words are good units for heatmaps, and help understand where things can go wrong. 7/7

English

151

Hendrik Schuff@HendrikSchuff·5 May

This paper also confirms our previous paper's results in a reproduction study, which shows just how robust these biases are in different text domains (we replicate effects for word length, capitalization, dependency relation and display index) 6/7

English

192

Hendrik Schuff@HendrikSchuff·5 May

Check out our new ACL'23 findings paper! (w/ @alon_jacovi, Heike Adel, @ThangVu2014, @yoavgo) Neighboring Words Affect Human Interpretation of Saliency Explanations arxiv.org/abs/2305.02679 We find that a word's perception is biased by its neighbors in heatmap explanations. 1/7

English

13.1K

Hendrik Schuff@HendrikSchuff·5 Nis

@kgashteo Short update on this: The article is now published as open access, so you can also read the final version at cambridge.org/core/journals/…

English

Kiril Gashteovski@kgashteo·21 Şub

@HendrikSchuff Thanks!

English

Hendrik Schuff@HendrikSchuff·21 Şub

Getting started with user studies in NLP? Our latest JNLE paper might be interesting to you! Paper: doi.org/10.1017/S13513… Author version: bit.ly/3lXt0MI Work /w Lindsey Vanderlyn, Heike Adel and @ThangVu2014 (@ims_stuttgart and @Bosch_AI) 1/3

English

Hendrik Schuff@HendrikSchuff·22 Şub

@alexbensan @ThangVu2014 @ims_stuttgart @Bosch_AI Hi, seems to be a problem with the forwarding, thanks for your comment! Can you access the author version directly here: hendrikschuff.de/files/How_to_D… ?

English

Alejandro Benito-Santos@alexbensan·22 Şub

@HendrikSchuff @ThangVu2014 @ims_stuttgart @Bosch_AI Hello, thanks for sharing but the link of the author version doesn't seem to work, can you check it out? Thanks

English

Hendrik Schuff@HendrikSchuff·21 Şub

@kgashteo Yes! You can find our author version of the paper at hendrikschuff.de/files/How_to_D…

English

Kiril Gashteovski@kgashteo·21 Şub

@HendrikSchuff Is there a non-paywalled version of this article?

English

Hendrik Schuff@HendrikSchuff·21 Şub

Our paper provides a brief introduction to the topic and focuses on applications and examples from NLP. We discuss various stages of conducting user studies including experimental designs, levels of measurement, crowdsourcing, and choosing appropriate statistical tests. 3/3

English

130

Hendrik Schuff@HendrikSchuff·21 Şub

Many NLP systems cannot be evaluated using proxy scores alone and require an (additional) human-centered evaluation. However, planning, conducting and evaluating user studies can be overwhelming for researchers getting started with human evaluation. 2/3

English

145

Keşfet

@ChenLiu47008770 @PfeiffJo @GoogleDeepMind @licwu @CambridgeLTL @IGurevych @UKPLab @devnull90