José Maria Pombal (@zmprcp) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Very proud of MindEval, our benchmark for multi-turn mental health conversations. Enjoyed recording this summary of what it is and why it matters with Maya. Kudos to co-authors Maya, @nunonmg, @PedroHenMartins, @tozefarinhas, and @RicardoRei7. Check out the video and the paper!

Sword Health@swordhealth

>1 billion people globally live with mental health conditions. AI offers a path to scalable support, but a critical question remains: can we trust current LLMs to provide effective therapeutic care? Introducing MindEval:

English

1

0

4

160

José Maria Pombal retweetledi

Vinod Khosla@vkhosla·10 Ara

Important work from @SwordHealth and @VSwordH on how we evaluate AI for mental health. MindEval is a new multi turn benchmark that tests how models behave across full therapy like conversations, not just single replies. Much needed, rigorous, and open source.

English

8

10

64

21.7K

José Maria Pombal@zmprcp·9 Ara

We release all prompts, data, and code on Github! Paper: arxiv.org/abs/2511.18491 Repo: github.com/SWORDHealth/mi…

English

0

32

José Maria Pombal@zmprcp·9 Ara

Very proud of MindEval, our benchmark for multi-turn mental health conversations. Enjoyed recording this summary of what it is and why it matters with Maya. Kudos to co-authors Maya, @nunonmg, @PedroHenMartins, @tozefarinhas, and @RicardoRei7. Check out the video and the paper!

Sword Health@swordhealth

>1 billion people globally live with mental health conditions. AI offers a path to scalable support, but a critical question remains: can we trust current LLMs to provide effective therapeutic care? Introducing MindEval:

English

1

0

4

160

José Maria Pombal retweetledi

Sardine Lab@sardine_lab_it·7 Kas

Our lab is attending @emnlpmeeting! 🔥🚀 Today don't miss @zmprcp at the poster session in Hall C, from 10:30-12:00, presenting "Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation"!!🔥🔥

English

0

2

110

José Maria Pombal@zmprcp·7 Eki

I'll be at COLM today presenting M-Prometheus (morning, Poster 40) and Zero-shot Benchmarking (afternoon, poster 9). Come check it out!

Sardine Lab@sardine_lab_it

Don't miss our lab's presentations today at @COLM_conf!! 🔥 We will have two presentations 1/3

English

1

3

5

1K

José Maria Pombal retweetledi

Andre Martins@andre_t_martins·5 Eki

1) Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models w/ @zmprcp @nunonmg @RicardoRei7 - Poster session 2, Tue Oct 7, 4:30 PM – 6:30 PM

English

1

2

217

José Maria Pombal retweetledi

Andre Martins@andre_t_martins·5 Eki

2) M-Prometheus: A Suite of Open Multilingual LLM Judges w/ @zmprcp @dongkeun_yoon @psanfernandes @ianwu97 @seungonekim @RicardoRei7 @gneubig - (Poster session 1, Tue Oct 7, 11:00 AM – 1:00 PM)

English

1

2

7

568

José Maria Pombal@zmprcp·28 Tem

Also, hit me up if you're at ACL and want to chat/meet :)

English

0

60

José Maria Pombal@zmprcp·28 Tem

Kudos to my amazing co-authors @swetaagrawal20 @ManosZaranis @psanfernandes @andre_t_martins

English

1

0

1

74

José Maria Pombal@zmprcp·28 Tem

I'll be at ACL presenting our work, A Context-aware Framework for Translation-mediated Conversations (arxiv.org/pdf/2412.04205) in the Machine Translation session, 28 Jul, 14:00-15:30, room 1.85. Come check it out if you're interested in bilingual chat MT!

English

1

2

7

642

José Maria Pombal@zmprcp·15 Tem

Last week was my final one at @Unbabel. I'm incredibly proud of our work (e.g., Tower, MINT, M-Prometheus, ZSB). Now, alongside my PhD studies at @istecnico, I'm joining @swordhealth as Senior Research Scientist under @RicardoRei7. Super confident in the team we're assembling.

English

1

0

12

485

José Maria Pombal retweetledi

Manos Zaranis@ManosZaranis·23 Haz

🚨Meet MF²: Movie Facts & Fibs: a new benchmark for long-movie understanding! 🤔Do you think your model understands movies? Unlike existing benchmarks, MF² targets memorable events, emotional arcs 💔, and causal chains 🔗 — things humans recall easily, but even top models like Gemini 2.5 Pro struggle with. 🧵Dive into the full thread👇

English

2

28

56

9.6K

José Maria Pombal@zmprcp·23 Haz

Check out the latest iteration of Tower models, Tower+. Ideal for translation tasks and beyond, and available at three different scales: 2B, 9B, 72B. All available on huggingface: huggingface.co/collections/Un… Kudos to everyone involved!

Ricardo Rei@RicardoRei7

🚀 Tower+: our latest model in the Tower family — sets a new standard for open-weight multilingual models! We show how to go beyond sentence-level translation, striking a balance between translation quality and general multilingual capabilities. 1/5 arxiv.org/pdf/2506.17080

English

0

1

10

537

José Maria Pombal retweetledi

Dongkeun Yoon@dongkeun_yoon·21 May

🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.

English

9

47

301

34.7K

José Maria Pombal retweetledi

Patrick Fernandes@psanfernandes·16 May

MT metrics excel at evaluating sentence translations, but struggle with complex texts We introduce *TREQA* a framework to assess how translations preserve key info by using LLMs to generate & answer questions about them arxiv.org/abs/2504.07583 (co-lead @swetaagrawal20) 1/15

English

2

12

38

5.3K

José Maria Pombal@zmprcp·9 Nis

@OwlOnCaffein @gm8xx8 It should handle them well (we include by-language results in the paper's appendix). Also, the model's training set contains data for evaluating outputs in Chinese (simplified). But always best to try it out on your use-case!

English

0

11

aoki / 月光@OwlOnCaffein·8 Nis

@gm8xx8 How well does it handle languages with very different structures, like Mandarin or Arabic?

English

1

0

22

José Maria Pombal retweetledi

Dongkeun Yoon@dongkeun_yoon·8 Nis

Introducing M-Prometheus — the latest iteration of the open LLM judge, Prometheus! Specially trained for multilingual evaluation. Excels across diverse settings, including the challenging task of literary translation assessment.

José Maria Pombal@zmprcp

We just released M-Prometheus, a suite of strong open multilingual LLM judges at 3B, 7B, and 14B parameters! Check out the models and training data on Huggingface: huggingface.co/collections/Un… and our paper: arxiv.org/abs/2504.04953

English

0

3

22

899

José Maria Pombal retweetledi

Seungone Kim@seungonekim·8 Nis

Here's our new paper on m-Prometheus, a series of multulingual judges! 1/ Effective at safety & translation eval 2/ Also stands out as a good reward model in BoN 3/ Backbone model selection & training on natively multilingual data is important Check out @zmprcp 's post!

José Maria Pombal@zmprcp

We just released M-Prometheus, a suite of strong open multilingual LLM judges at 3B, 7B, and 14B parameters! Check out the models and training data on Huggingface: huggingface.co/collections/Un… and our paper: arxiv.org/abs/2504.04953

English

0

2

19

1.2K

José Maria Pombal@zmprcp·8 Nis

Models and training data: huggingface.co/collections/Un… Paper: arxiv.org/abs/2504.04953

English

0

3

154

José Maria Pombal@zmprcp·8 Nis

Massive kudos to co-authors @dongkeun_yoon @psanfernandes @ianwu97 @seungonekim @RicardoRei7 @gneubig @andre_t_martins

Português

1

0

5

218

José Maria Pombal@zmprcp·8 Nis

There were a lot of open questions on what strategies work for building multilingual LLM judges. We perform ablations on our training recipe that highlight the importance of backbone model choice and of using natively multilingual—instead of translated—training data.

English

1

0

3

194

José Maria Pombal

Keşfet