José Maria Pombal

68 posts

José Maria Pombal banner
José Maria Pombal

José Maria Pombal

@zmprcp

Senior Research Scientist @swordhealth, PhD student @istecnico.

Lisbon, Portugal Katılım Mart 2023
124 Takip Edilen95 Takipçiler
Sabitlenmiş Tweet
José Maria Pombal
José Maria Pombal@zmprcp·
Very proud of MindEval, our benchmark for multi-turn mental health conversations. Enjoyed recording this summary of what it is and why it matters with Maya. Kudos to co-authors Maya, @nunonmg, @PedroHenMartins, @tozefarinhas, and @RicardoRei7. Check out the video and the paper!
Sword Health@swordhealth

>1 billion people globally live with mental health conditions. AI offers a path to scalable support, but a critical question remains: can we trust current LLMs to provide effective therapeutic care? Introducing MindEval:

English
1
0
4
160
José Maria Pombal retweetledi
Vinod Khosla
Vinod Khosla@vkhosla·
Important work from @SwordHealth and @VSwordH on how we evaluate AI for mental health. MindEval is a new multi turn benchmark that tests how models behave across full therapy like conversations, not just single replies. Much needed, rigorous, and open source.
English
8
10
64
21.7K
José Maria Pombal
José Maria Pombal@zmprcp·
Very proud of MindEval, our benchmark for multi-turn mental health conversations. Enjoyed recording this summary of what it is and why it matters with Maya. Kudos to co-authors Maya, @nunonmg, @PedroHenMartins, @tozefarinhas, and @RicardoRei7. Check out the video and the paper!
Sword Health@swordhealth

>1 billion people globally live with mental health conditions. AI offers a path to scalable support, but a critical question remains: can we trust current LLMs to provide effective therapeutic care? Introducing MindEval:

English
1
0
4
160
José Maria Pombal retweetledi
Sardine Lab
Sardine Lab@sardine_lab_it·
Our lab is attending @emnlpmeeting! 🔥🚀 Today don't miss @zmprcp at the poster session in Hall C, from 10:30-12:00, presenting "Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation"!!🔥🔥
English
0
2
2
110
José Maria Pombal retweetledi
Andre Martins
Andre Martins@andre_t_martins·
1) Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models w/ @zmprcp @nunonmg @RicardoRei7 - Poster session 2, Tue Oct 7, 4:30 PM – 6:30 PM
English
1
1
2
217
José Maria Pombal
José Maria Pombal@zmprcp·
Also, hit me up if you're at ACL and want to chat/meet :)
English
0
0
0
60
José Maria Pombal
José Maria Pombal@zmprcp·
I'll be at ACL presenting our work, A Context-aware Framework for Translation-mediated Conversations (arxiv.org/pdf/2412.04205) in the Machine Translation session, 28 Jul, 14:00-15:30, room 1.85. Come check it out if you're interested in bilingual chat MT!
English
1
2
7
642
José Maria Pombal
José Maria Pombal@zmprcp·
Last week was my final one at @Unbabel. I'm incredibly proud of our work (e.g., Tower, MINT, M-Prometheus, ZSB). Now, alongside my PhD studies at @istecnico, I'm joining @swordhealth as Senior Research Scientist under @RicardoRei7. Super confident in the team we're assembling.
English
1
0
12
485
José Maria Pombal retweetledi
Manos Zaranis
Manos Zaranis@ManosZaranis·
🚨Meet MF²: Movie Facts & Fibs: a new benchmark for long-movie understanding! 🤔Do you think your model understands movies? Unlike existing benchmarks, MF² targets memorable events, emotional arcs 💔, and causal chains 🔗 — things humans recall easily, but even top models like Gemini 2.5 Pro struggle with. 🧵Dive into the full thread👇
Manos Zaranis tweet media
English
2
28
56
9.6K
José Maria Pombal
José Maria Pombal@zmprcp·
Check out the latest iteration of Tower models, Tower+. Ideal for translation tasks and beyond, and available at three different scales: 2B, 9B, 72B. All available on huggingface: huggingface.co/collections/Un… Kudos to everyone involved!
Ricardo Rei@RicardoRei7

🚀 Tower+: our latest model in the Tower family — sets a new standard for open-weight multilingual models! We show how to go beyond sentence-level translation, striking a balance between translation quality and general multilingual capabilities. 1/5 arxiv.org/pdf/2506.17080

English
0
1
10
537
José Maria Pombal retweetledi
Dongkeun Yoon
Dongkeun Yoon@dongkeun_yoon·
🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.
Dongkeun Yoon tweet media
English
9
47
301
34.7K
José Maria Pombal retweetledi
Patrick Fernandes
Patrick Fernandes@psanfernandes·
MT metrics excel at evaluating sentence translations, but struggle with complex texts We introduce *TREQA* a framework to assess how translations preserve key info by using LLMs to generate & answer questions about them arxiv.org/abs/2504.07583 (co-lead @swetaagrawal20) 1/15
Patrick Fernandes tweet media
English
2
12
38
5.3K
José Maria Pombal
José Maria Pombal@zmprcp·
@OwlOnCaffein @gm8xx8 It should handle them well (we include by-language results in the paper's appendix). Also, the model's training set contains data for evaluating outputs in Chinese (simplified). But always best to try it out on your use-case!
English
0
0
0
11
aoki / 月光
aoki / 月光@OwlOnCaffein·
@gm8xx8 How well does it handle languages with very different structures, like Mandarin or Arabic?
English
1
0
0
22
José Maria Pombal retweetledi
Dongkeun Yoon
Dongkeun Yoon@dongkeun_yoon·
Introducing M-Prometheus — the latest iteration of the open LLM judge, Prometheus! Specially trained for multilingual evaluation. Excels across diverse settings, including the challenging task of literary translation assessment.
José Maria Pombal@zmprcp

We just released M-Prometheus, a suite of strong open multilingual LLM judges at 3B, 7B, and 14B parameters! Check out the models and training data on Huggingface: huggingface.co/collections/Un… and our paper: arxiv.org/abs/2504.04953

English
0
3
22
899
José Maria Pombal retweetledi
Seungone Kim
Seungone Kim@seungonekim·
Here's our new paper on m-Prometheus, a series of multulingual judges! 1/ Effective at safety & translation eval 2/ Also stands out as a good reward model in BoN 3/ Backbone model selection & training on natively multilingual data is important Check out @zmprcp 's post!
José Maria Pombal@zmprcp

We just released M-Prometheus, a suite of strong open multilingual LLM judges at 3B, 7B, and 14B parameters! Check out the models and training data on Huggingface: huggingface.co/collections/Un… and our paper: arxiv.org/abs/2504.04953

English
0
2
19
1.2K
José Maria Pombal
José Maria Pombal@zmprcp·
There were a lot of open questions on what strategies work for building multilingual LLM judges. We perform ablations on our training recipe that highlight the importance of backbone model choice and of using natively multilingual—instead of translated—training data.
José Maria Pombal tweet media
English
1
0
3
194