Jonas Becker

133 posts

Jonas Becker banner
Jonas Becker

Jonas Becker

@BeckerNLP

Researcher at @GippLab (@uniGoettingen) and @polizei_nrw_lka who loves #NLP, #AI and food. I also enjoy drinking tea while coding!

Göttingen, Germany Bergabung Mart 2023
45 Mengikuti28 Pengikut
Tweet Disematkan
Jonas Becker
Jonas Becker@BeckerNLP·
🚀 Excited to share our EMNLP 2025 Demo Paper! MALLM: Multi-Agent Large Language Models Framework Conduct your experiments on agents discussions, and decisions. 🎥 mallm.gipplab.org 👋 See our poster in Session 11 (Thu, Nov 6 · 16:30–18:00, Hall C) #EMNLP2025
English
0
0
7
270
Jonas Becker
Jonas Becker@BeckerNLP·
🚀 Presented at #EACL2026: "Stay Focused: Problem Drift in Multi-Agent Debate" We identify and mitigate performance drift in ongoing multi-agent interactions. Key insight: longer debates ≠ better answers - staying focused matters more. 📜 Paper: aclanthology.org/2026.findings-…
English
0
0
2
59
Jonas Becker
Jonas Becker@BeckerNLP·
We will present our work "Stay Focused: Problem Drift in Multi-Agent Debate" at #EACL2026 this week. 🗓️ Our session is Friday (Poster Session 6, 11:00-12:30). If you are around, feel free to stop by so we can have a chat or just say hi. Paper: aclanthology.org/2026.findings-…
English
0
0
1
75
Jonas Becker
Jonas Becker@BeckerNLP·
🚀DimStance: Multilingual Dimensional Stance Analysis 🧠 Stance ≠ just Favor / Against 📏 Models stance on valence (low → high) & arousal (calm → active) 📢English, German, Chinese, Nigerian Pidgin, and Swahili 🌍 Politics, Environment arxiv.org/abs/2601.21483 #SemEval2026
English
0
0
0
11
Jonas Becker
Jonas Becker@BeckerNLP·
🚀 Stay Focused: Problem Drift in Multi-Agent Debate ✅ Accepted at #EACL2026 🗣️ MAD drifts from the original problem over turns 🔎 Analysis across reasoning, knowledge & instruction-following 🛠️ DRIFTJudge (detect) & DRIFTPolicy (mitigate) 📄 Preprint: arxiv.org/abs/2502.19559
English
0
0
0
45
Jonas Becker me-retweet
GippLab@Uni-Göttingen
GippLab@Uni-Göttingen@GippLab·
GippLab attending #ACL2025NLP in Vienna this week! 📄 We presented three papers 🙌 🔗 Find the titles and links to all papers in the comments below👇 #ACL #ACL2025 #ACL25
GippLab@Uni-Göttingen tweet media
English
3
2
6
537
Jonas Becker
Jonas Becker@BeckerNLP·
More examples and experiments are available in our new preprint "Stay Focused: Problem Drift in Multi-Agent Debate" arxiv.org/abs/2502.19559
English
0
0
0
31
Jonas Becker
Jonas Becker@BeckerNLP·
❓ What is Problem Drift in multi-agent debate? This example shows how agents start with a good solution. However, it gets worse with longer debates. One agent induces a logical error in the debate. The other agents agree without skepticism, leading to the wrong solution.
Jonas Becker tweet media
English
1
0
0
41
Jonas Becker
Jonas Becker@BeckerNLP·
Stay Focused: Problem Drift in Multi-Agent Debate Multi-agent LLMs are prone to making errors during longer interactions. Check out how we define this "problem drift", investigate its reasons, and test detection and mitigation strategies at test-time. arxiv.org/abs/2502.19559
English
0
0
2
37
Jonas Becker
Jonas Becker@BeckerNLP·
The ethical alignment of Multi-Agent LLMs (MALLM) collapses during ongoing discussions. This raises concerns about AI safety, highlighting how multi-agent settings come with novel safety challenges that weren't relevant in single-agent scenarios. Paper: arxiv.org/pdf/2410.22932
Jonas Becker tweet media
English
0
0
1
105
Jonas Becker
Jonas Becker@BeckerNLP·
🚀MALLM🚀 - Conduct your own multi-agent research by using our new framework for problem-solving: github.com/Multi-Agent-LL… MALLM comes with a dataset loader, easy yet configurable discussion formats, and an integrated evaluation pipeline. 🥳
English
0
0
0
42
Jonas Becker
Jonas Becker@BeckerNLP·
Multi-Agent LLMs for Conversational Task-Solving 💬 Contributions: 1) Taxonomy of multi-agent systems for task-solving 2) Multi-agent framework for your studies 3) Identifies three problems with multi-agent systems: Performance, Alignment, Monopolization arxiv.org/abs/2410.22932
English
0
0
1
25
Jonas Becker
Jonas Becker@BeckerNLP·
@OpenAI I am happy to wait a few seconds more if the answer is better. That's a good direction to go.
English
0
0
1
51
OpenAI
OpenAI@OpenAI·
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math. openai.com/index/introduc…
English
940
3.9K
17.4K
8.3M
Jonas Becker
Jonas Becker@BeckerNLP·
@yang3kc Nice work! The more information you give in the prompt and the more the model has to care for during the generation, the less focus can be on the actual task. So it makes total sense that reasoning gets worse here. Would be interesting to see other tasks evaluated like this too
English
0
0
0
46
Kevin Yang
Kevin Yang@yang3kc·
This is not good, "Surprisingly, we observe a significant decline in LLMs’ reasoning abilities under format restrictions." Link: arxiv.org/abs/2408.02442
Kevin Yang tweet media
English
33
92
548
169.7K
Jonas Becker
Jonas Becker@BeckerNLP·
@MatthewBerman It's crazy how easy that is. But even when fixing this, people will discover new ways.
English
0
0
0
16
Matthew Berman
Matthew Berman@MatthewBerman·
Possibly the easiest jailbreak ever was just discovered! "In the past..." Models will never be perfectly aligned.
Matthew Berman tweet media
English
79
64
1.1K
137.6K
Jonas Becker
Jonas Becker@BeckerNLP·
@mckaywrigley I already use @cursor_ai a lot. Within minutes, I got bar charts, line graphs, and correlation matrices for my research project. I just needed to adjust some little things to make it look nice.
English
0
0
0
43
Mckay Wrigley
Mckay Wrigley@mckaywrigley·
This is what the future of software development will look like. AI writes 80% of the code. Human devs finish the last 20%. The next wave of models will begin to unlock the potential of these tools. Massive AI codegen wave incoming.
English
65
140
1.2K
111.1K
Megatron
Megatron@Megatron_ron·
🌍 This is what the cyber apocalypse looks like: Crowdstrike antivirus has broken Windows worldwide The global technical failure affected systems in the US, Britain, Germany, Japan, Israel, India and other countries. The reason for the abnormal failures is Crowdstrike's security systems. The developer has already confirmed a link between its software and Windows problems. You can solve the problem by putting your computers in safe mode and removing certain software components. The failure is unrelated to cyber attacks, but its consequences will be huge, cyber security experts warn. "Today is the day Crowdstrike dies," says Senad Arun, founder of cyber research firm Imperum. The Russian Federation is not affected by the global outage, as Crowdstrike is hardly used in the country. Microsoft shut down most of its Russian Azure cloud platform servers a year ago as part of sanctions. Now it does not work due to malfunctions, which affects the work of foreign companies.
Megatron tweet mediaMegatron tweet mediaMegatron tweet mediaMegatron tweet media
English
135
696
3.2K
2.2M
Jonas Becker
Jonas Becker@BeckerNLP·
8/8 🚀 Challenges: We identify 9 overarching challenges in text generation. These are bias, misuse, reasoning, hallucinations, privacy, transparency, interpretability, datasets, and computing. For each, we survey state-of-the-art research and provide research directions.
English
0
0
1
38
Jonas Becker
Jonas Becker@BeckerNLP·
7/8 📊 Evaluation: Researchers heavily rely on automated metrics. We find that most works use n-gram overlap metrics like BLEU, ROUGE, and METEOR. We raise awareness about other metrics to complement evaluation (statistical, graph-based, model-based).
English
1
0
1
39
Jonas Becker
Jonas Becker@BeckerNLP·
📑 Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges Explore recent advances in text generation since 2017, focusing on five core sub-tasks and highlighting key research gaps. 🔗 Read the paper: arxiv.org/abs/2405.15604 🧵 1/8
English
1
2
2
499