Qianchu (Flora) Liu

6 posts

Qianchu (Flora) Liu

Qianchu (Flora) Liu

@QianchuL

Katılım Mart 2016
52 Takip Edilen48 Takipçiler
Qianchu (Flora) Liu retweetledi
Sheng Zhang
Sheng Zhang@sheng_zh·
🧠Excited to present X-Reasoner — a 7B vision-language model post-trained for reasoning purely on general-domain text, without any images or domain-specific data. X-Reasoner achieves the state of the art 🏆 on challenging multimodal tasks (e.g., 43.0 on MMMU-Pro) and medical benchmarks (e.g., 45.7 on the NEJM Image Challenge).🧵 Most open-source work on reasoning models focuses on text inputs and general domains. But real-world reasoning often spans multiple modalities (like vision) and specialized domains (like healthcare). We ask: 👉Can reasoning be made generalizable with only text-based post-training? Key idea → A two-stage recipe:: 🔹 SFT on text-only general-domain long CoTs 🔹 RL with verifiable rewards on text-only math Qs No images, no domain-specific data—just general text. This recipe powers X-Reasoner, a 7B-scale vision-language model. Despite being trained only on general-domain text, it: ✅ Transfers to multimodal tasks (e.g., MathVista, MMMU-Pro) ✅ Outperforms 7B SOTA models trained with multimodal supervision ✅ Excels in unseen domains like medicine 🤔Why it works 🔑 Math as an anchor—RL on maths yields reasoning chains that generalise better than domain-specific RL alone. 🔑 Forced-exit token prevents “infinite thinking,” boosting reliability. Ablation ✅: Remove every example solvable by text-only… gains persist. The model is truly reading the image, not gaming the benchmark. 🩺We then add a dash of medical text → X-Reasoner-Med. No images needed—just additional MedQA SFT + RL—and we set new 7 B SOTA on MedQA, OmniMedVQA, MMMU-Health, MedXpertQA-MM, and NEJM Image Challenge. 🔬 TL;DR: General-domain text-based reasoning is more powerful than we thought. With X-Reasoner, we show that high-quality reasoning models can be trained without costly multimodal or domain-specific supervision—and still outperform those that do. 📌 Paper: arxiv.org/abs/2505.03981 🔗 Models: github.com/microsoft/x-re… (release soon) 📊 Benchmarks: MMMU, MathVista, MedQA, NEJM, and more 🤖 Model size: 7B 🧑‍🔬 Authors: @QianchuL, @sheng_zh, @hiaoxui, Timothy Ossowski, Yu Gu, Ying Jin, @sidkiblawi, Sam Preston, Mu Wei, Paul Vozila, @TristanNaumann, and @hoifungpoon, from @MSFTResearch
Sheng Zhang tweet media
English
2
7
35
3.4K
Qianchu (Flora) Liu retweetledi
Greg Brockman
Greg Brockman@gdb·
GPT-4 for radiology. Far from perfect, but state-of-the-art performance on some tasks: “Surprisingly, we found radiology report summaries generated by GPT-4 to be comparable and, in some cases, even preferred over those written by experienced radiologists” microsoft.com/en-us/research…
English
240
861
5.7K
1.4M
Qianchu (Flora) Liu retweetledi
AK
AK@_akhaliq·
Exploring the Boundaries of GPT-4 in Radiology paper page: huggingface.co/papers/2310.14… The recent success of general-domain large language models (LLMs) has significantly changed the natural language processing paradigm towards a unified foundation model across domains and applications. In this paper, we focus on assessing the performance of GPT-4, the most capable LLM so far, on the text-based applications for radiology reports, comparing against state-of-the-art (SOTA) radiology-specific models. Exploring various prompting strategies, we evaluated GPT-4 on a diverse range of common radiology tasks and we found GPT-4 either outperforms or is on par with current SOTA radiology models. With zero-shot prompting, GPT-4 already obtains substantial gains (approx 10% absolute improvement) over radiology models in temporal sentence similarity classification (accuracy) and natural language inference (F_1). For tasks that require learning dataset-specific style or schema (e.g. findings summarisation), GPT-4 improves with example-based prompting and matches supervised SOTA. Our extensive error analysis with a board-certified radiologist shows GPT-4 has a sufficient level of radiology knowledge with only occasional errors in complex context that require nuanced domain knowledge. For findings summarisation, GPT-4 outputs are found to be overall comparable with existing manually-written impressions.
AK tweet media
English
3
42
149
72.8K
Qianchu (Flora) Liu retweetledi
_hylandSL - not here
_hylandSL - not here@_hylandSL·
Our #TACL paper on Compositional Zero-Shot Domain Transfer is now out! direct.mit.edu/tacl/article/d… We show that training on general-domain task data (say, NLI) and in-domain unstructured data (say, radiology reports) enables in-domain task capability (radiology NLI)!
_hylandSL - not here tweet media
English
1
17
65
14.4K
Qianchu (Flora) Liu
Qianchu (Flora) Liu@QianchuL·
@ArgosHelpers Hi. I have tried to contact you for two weeks but no one responded. I would like to change an order’s address. There’s no way to cancel or amend the order online. It has been dispatched and delivered to the wrong address again and again. Could you help me with this please!
English
0
0
0
0
Argos Helpers
Argos Helpers@ArgosHelpers·
Following the Prime Minister’s announcement, standalone Argos stores are now closed. Our website is open as usual and still offering fast delivery. Argos stores in Sainsbury’s supermarkets are still open. To find out more, visit bit.ly/33F8y5X
English
248
10
47
0