Jonas Becker

133 posts

Jonas Becker

@BeckerNLP

Researcher at @GippLab (@uniGoettingen) and @polizei_nrw_lka who loves #NLP, #AI and food. I also enjoy drinking tea while coding!

Göttingen, Germany Bergabung Mart 2023

45 Mengikuti28 Pengikut

Tweet Disematkan

Jonas Becker@BeckerNLP·4 Kas

🚀 Excited to share our EMNLP 2025 Demo Paper! MALLM: Multi-Agent Large Language Models Framework Conduct your experiments on agents discussions, and decisions. 🎥 mallm.gipplab.org 👋 See our poster in Session 11 (Thu, Nov 6 · 16:30–18:00, Hall C) #EMNLP2025

English

270

Jonas Becker@BeckerNLP·5d

🚀 Presented at #EACL2026: "Stay Focused: Problem Drift in Multi-Agent Debate" We identify and mitigate performance drift in ongoing multi-agent interactions. Key insight: longer debates ≠ better answers - staying focused matters more. 📜 Paper: aclanthology.org/2026.findings-…

English

Jonas Becker@BeckerNLP·25 Mar

We will present our work "Stay Focused: Problem Drift in Multi-Agent Debate" at #EACL2026 this week. 🗓️ Our session is Friday (Poster Session 6, 11:00-12:30). If you are around, feel free to stop by so we can have a chat or just say hi. Paper: aclanthology.org/2026.findings-…

English

Jonas Becker@BeckerNLP·4 Şub

🚀DimStance: Multilingual Dimensional Stance Analysis 🧠 Stance ≠ just Favor / Against 📏 Models stance on valence (low → high) & arousal (calm → active) 📢English, German, Chinese, Nigerian Pidgin, and Swahili 🌍 Politics, Environment arxiv.org/abs/2601.21483 #SemEval2026

English

Jonas Becker@BeckerNLP·21 Oca

🚀 Stay Focused: Problem Drift in Multi-Agent Debate ✅ Accepted at #EACL2026 🗣️ MAD drifts from the original problem over turns 🔎 Analysis across reasoning, knowledge & instruction-following 🛠️ DRIFTJudge (detect) & DRIFTPolicy (mitigate) 📄 Preprint: arxiv.org/abs/2502.19559

English

Jonas Becker@BeckerNLP·22 Eyl

🚀 MALLM: a plug-and-play framework for multi-agent debate. ✅ Accepted as #EMNLP2025 Demo 🔧 144+ configurations out of the box 🔎 Find the best multi-agent setup for your research 📄 Preprint: arxiv.org/pdf/2509.11656 🧪 Demo: mallm.gipplab.org

English

123

Jonas Becker me-retweet

GippLab@Uni-Göttingen@GippLab·1 Ağu

GippLab attending #ACL2025NLP in Vienna this week! 📄 We presented three papers 🙌 🔗 Find the titles and links to all papers in the comments below👇 #ACL #ACL2025 #ACL25

English

537

Jonas Becker me-retweet

Jan Philip Wahle@jpwahle·31 Tem

What a nice birthday gift: ACL Best Resource Paper Award and SemEval Best Task Award. Thanks to all the collaborators who made this possible!

ACL 2026@aclmeeting

Best Resource Paper (1/2) #ACL2025NLP

English

317

Jonas Becker@BeckerNLP·11 Mar

More examples and experiments are available in our new preprint "Stay Focused: Problem Drift in Multi-Agent Debate" arxiv.org/abs/2502.19559

English

Jonas Becker@BeckerNLP·11 Mar

❓ What is Problem Drift in multi-agent debate? This example shows how agents start with a good solution. However, it gets worse with longer debates. One agent induces a logical error in the debate. The other agents agree without skepticism, leading to the wrong solution.

English

Jonas Becker@BeckerNLP·4 Mar

Stay Focused: Problem Drift in Multi-Agent Debate Multi-agent LLMs are prone to making errors during longer interactions. Check out how we define this "problem drift", investigate its reasons, and test detection and mitigation strategies at test-time. arxiv.org/abs/2502.19559

English

Jonas Becker@BeckerNLP·17 Kas

The ethical alignment of Multi-Agent LLMs (MALLM) collapses during ongoing discussions. This raises concerns about AI safety, highlighting how multi-agent settings come with novel safety challenges that weren't relevant in single-agent scenarios. Paper: arxiv.org/pdf/2410.22932

English

105

Jonas Becker@BeckerNLP·9 Kas

🚀MALLM🚀 - Conduct your own multi-agent research by using our new framework for problem-solving: github.com/Multi-Agent-LL… MALLM comes with a dataset loader, easy yet configurable discussion formats, and an integrated evaluation pipeline. 🥳

English

Jonas Becker@BeckerNLP·4 Kas

Multi-Agent LLMs for Conversational Task-Solving 💬 Contributions: 1) Taxonomy of multi-agent systems for task-solving 2) Multi-agent framework for your studies 3) Identifies three problems with multi-agent systems: Performance, Alignment, Monopolization arxiv.org/abs/2410.22932

English

Jonas Becker@BeckerNLP·12 Eyl

@OpenAI I am happy to wait a few seconds more if the answer is better. That's a good direction to go.

English

OpenAI@OpenAI·12 Eyl

We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introduc…

English

940

3.9K

17.4K

8.3M

Jonas Becker@BeckerNLP·7 Ağu

@yang3kc Nice work! The more information you give in the prompt and the more the model has to care for during the generation, the less focus can be on the actual task. So it makes total sense that reasoning gets worse here. Would be interesting to see other tasks evaluated like this too

English

Kevin Yang@yang3kc·7 Ağu

This is not good, "Surprisingly, we observe a significant decline in LLMs’ reasoning abilities under format restrictions." Link: arxiv.org/abs/2408.02442

English

548

169.7K

Jonas Becker@BeckerNLP·19 Tem

@MatthewBerman It's crazy how easy that is. But even when fixing this, people will discover new ways.

English

Matthew Berman@MatthewBerman·18 Tem

Possibly the easiest jailbreak ever was just discovered! "In the past..." Models will never be perfectly aligned.

English

1.1K

137.6K

Jonas Becker@BeckerNLP·19 Tem

@mckaywrigley I already use @cursor_ai a lot. Within minutes, I got bar charts, line graphs, and correlation matrices for my research project. I just needed to adjust some little things to make it look nice.

English

Mckay Wrigley@mckaywrigley·18 Tem

This is what the future of software development will look like. AI writes 80% of the code. Human devs finish the last 20%. The next wave of models will begin to unlock the potential of these tools. Massive AI codegen wave incoming.

English

140

1.2K

111.1K

Jonas Becker@BeckerNLP·19 Tem

@Megatron_ron That's no breaking news. It's broken news.

English

2.3K

Megatron@Megatron_ron·19 Tem

🌍 This is what the cyber apocalypse looks like: Crowdstrike antivirus has broken Windows worldwide The global technical failure affected systems in the US, Britain, Germany, Japan, Israel, India and other countries. The reason for the abnormal failures is Crowdstrike's security systems. The developer has already confirmed a link between its software and Windows problems. You can solve the problem by putting your computers in safe mode and removing certain software components. The failure is unrelated to cyber attacks, but its consequences will be huge, cyber security experts warn. "Today is the day Crowdstrike dies," says Senad Arun, founder of cyber research firm Imperum. The Russian Federation is not affected by the global outage, as Crowdstrike is hardly used in the country. Microsoft shut down most of its Russian Azure cloud platform servers a year ago as part of sanctions. Now it does not work due to malfunctions, which affects the work of foreign companies.

English

135

696

3.2K

2.2M

Jonas Becker@BeckerNLP·19 Tem

8/8 🚀 Challenges: We identify 9 overarching challenges in text generation. These are bias, misuse, reasoning, hallucinations, privacy, transparency, interpretability, datasets, and computing. For each, we survey state-of-the-art research and provide research directions.

English

Jonas Becker@BeckerNLP·19 Tem

7/8 📊 Evaluation: Researchers heavily rely on automated metrics. We find that most works use n-gram overlap metrics like BLEU, ROUGE, and METEOR. We raise awareness about other metrics to complement evaluation (statistical, graph-based, model-based).

English

Jonas Becker@BeckerNLP·19 Tem

📑 Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges Explore recent advances in text generation since 2017, focusing on five core sub-tasks and highlighting key research gaps. 🔗 Read the paper: arxiv.org/abs/2405.15604 🧵 1/8

English

499

Jelajahi

@OpenAI @yang3kc @MatthewBerman @mckaywrigley @cursor_ai @elonmusk @BarackObama @taylorswift13