Thomas Sounack (@tsounack) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Very excited to share the release of BioClinical ModernBERT! Highlights: - biggest and most diverse biomedical and clinical dataset for an encoder - 8192 context - fastest throughput with a variety of inputs - sota results across several tasks - base and large sizes (1/8)

English

4

14

66

16.7K

Thomas Sounack retweetledi

TRIPODStatement@TRIPODStatement·12 Şub

We are setting out to develop some new recommendations (TRIPOD-CODE) to provide guidance on reporting the availability & structure of code for predictive AI healthcare tools Watch this space & read the protocol here link.springer.com/article/10.118… #transparency #code #reproducibility

English

0

9

17

1.5K

Thomas Sounack@tsounack·21 Oca

Another interesting finding was that simple fine-tuning allowed these small models to consistently return parsable JSON outputs. This may be worth exploring if you plan to use small LLMs for a structured output generation task

English

0

20

Thomas Sounack@tsounack·21 Oca

These small open-sourced LLMs can run on laptops (and even good smartphones), meaning that any institution can run them securely behind their firewall. This is significant since HIPAA-compliant LLM access is still rare for medical institutions.

English

1

0

25

Thomas Sounack@tsounack·21 Oca

Our Medslice paper was just accepted at @JAMIAOpen! We provide a pipeline to extract clinically relevant sections of medical notes (HPI, Interval Hx, Assessment and Plan) using fine-tuned language models.

English

1

3

46

Thomas Sounack@tsounack·10 Eyl

@MaziyarPanahi Yes, we tested our model on sequence and token classification for both biomedical and clinical datasets. You can find the results and our analysis in our preprint: arxiv.org/pdf/2506.10896

English

0

1

62

Maziyar PANAHI@MaziyarPanahi·10 Eyl

@tsounack thanks for sharing. have you done any evals on downstream tasks specially medical token for classification to see the gain over the original model?

English

1

0

78

Thomas Sounack@tsounack·10 Eyl

Want to continue training an encoder on your own data, but not sure where to start? Our step-by-step guide for reproducing the BioClinical ModernBERT training was just released! 1/5

English

2

3

14

2.4K

Thomas Sounack@tsounack·10 Eyl

If you would like to see more details about a certain aspect of the guide, please don't hesitate to reach out! Your contributions are welcome and will be acknowledged. Link to our HF collection: huggingface.co/collections/th… Link to our paper: arxiv.org/abs/2506.10896 5/5

English

0

1

2

137

Thomas Sounack@tsounack·10 Eyl

If you are working with a lot of biomedical and/or clinical text, consider continuing MLM training of BioClinical ModernBERT on your own data! The resulting encoder will be much easier to fine-tune on your various downstream tasks (embedding model for RAG, classifier...) 4/5

English

1

0

1

152

Thomas Sounack@tsounack·27 Haz

Exciting work from @neumll !

NeuML@neumll

🧬🔬⚕️ Building on the popularity of our PubMedBERT Embeddings model, we're excited to release a long context medical embeddings model! It's built on the great work below from @tsounack Model: huggingface.co/thomas-sounack… Paper: arxiv.org/abs/2506.10896 huggingface.co/NeuML/bioclini…

English

0

4

125

Thomas Sounack@tsounack·18 Haz

Exciting to see BioClinical ModernBERT (base) ranked #2 among trending fill-mask models - right after BERT! The large version is currently at #4. Grateful for the interest, and can’t wait to see what projects people apply it to!

English

0

7

12

942

Thomas Sounack@tsounack·17 Haz

Github link: github.com/lindvalllab/Bi…

English

0

1

6

136

Thomas Sounack@tsounack·17 Haz

BioClinical ModernBERT github repo is online! It contains: - Our continued pretraining config files - Performance eval code - Inference speed eval code Step-by-step guide on how to continue ModernBERT or BioClinical ModernBERT pretraining coming in the next few days!

English

1

3

17

804

Thomas Sounack retweetledi

Mike Dupont@introsp3ctor·14 Haz

codepen.io/jmikedupont2/p… #scrollTo=R3DnkbFB-OB2" target="_blank" rel="nofollow noopener">colab.research.google.com/drive/1uSx8yYZ… next demo visualizing BioClinical-ModernBERT-base embeddings on a sphere

English

3

1

6

474

Thomas Sounack@tsounack·16 Haz

@robot__fan @antoine_chaffin Should be up now!

English

1

0

2

16

William@robot__fan·14 Haz

@tsounack @antoine_chaffin thank you! no rush :)

English

1

0

1

23

Antoine Chaffin@antoine_chaffin·13 Haz

You can just continue pre-train things ✨ Happy to announce the release of BioClinical ModernBERT, a ModernBERT model whose pre-training has been continued on medical data The result: SOTA performance on various medical tasks with long context support and ModernBERT efficiency

Thomas Sounack@tsounack

Very excited to share the release of BioClinical ModernBERT! Highlights: - biggest and most diverse biomedical and clinical dataset for an encoder - 8192 context - fastest throughput with a variety of inputs - sota results across several tasks - base and large sizes (1/8)

English

4

33

212

69.5K

Thomas Sounack@tsounack·16 Haz

@SunJacques_ Thanks Jacques! Was cleaning up the repo, it should be accessible now.

English

0

1

31

Jacques Sun@SunJacques_·14 Haz

@tsounack Nice work, Thomas! 👏 FYI the GitHub link seems to be broken. Could you verify the URL? Would love to explore the implementation details.

English

1

0

1

70

Thomas Sounack@tsounack·13 Haz

Very excited to share the release of BioClinical ModernBERT! Highlights: - biggest and most diverse biomedical and clinical dataset for an encoder - 8192 context - fastest throughput with a variety of inputs - sota results across several tasks - base and large sizes (1/8)

English

4

14

66

16.7K

Thomas Sounack retweetledi

Joseph Pollack #Ï 🎗️@josephpollack·13 Haz

we are so back "Mitochondria is the powerhouse of the [MASK]."

Thomas Sounack@tsounack

Very excited to share the release of BioClinical ModernBERT! Highlights: - biggest and most diverse biomedical and clinical dataset for an encoder - 8192 context - fastest throughput with a variety of inputs - sota results across several tasks - base and large sizes (1/8)

English

5

2

9

694

Thomas Sounack

Keşfet