Miriam

437 posts

Miriam

@superrzk

Junior NLP researcher, previously data scientist, CS @imperialcollege

Katılım Eylül 2018

1.2K Takip Edilen215 Takipçiler

Miriam retweetledi

Yacine Jernite@YJernite·17 Haz

What does "responsible" research on Large Language Models require, and why did it take over 1000 participants to attempt it in @BigscienceW? We're releasing the first article in a series with @mtlaiethics focusing on several aspects of this question 🧵1/n montrealethics.ai/category/colum…

English

111

Miriam@superrzk·20 Haz

Masader is expanding with more datasets. For more info about the project, check the project website and LREC paper.

Zaid زيد@zaidalyafeai

Announcing Masader v2.0 with +500 Arabic NLP datasets, we have added many features 🧵 website: arbml.github.io/masader/ code: github.com/ARBML/masader

English

Miriam retweetledi

MMitchell@mmitchell_ai·8 Haz

Reminder to everyone starting to publish in ML: "Foundation models" is *not* a recognized ML term; was coined by Stanford alongside announcing their center named for it; continues to be pushed by Sford as *the* term for what we've all generally (reasonably) called "base models".

Stanford HAI@StanfordHAI

Oversight of foundation models requires multi-stakeholder partnerships, including independent organizations not driven by commercial incentives. We need to leverage the collective wisdom of the community and represent the diverse voices of the people that this technology impacts.

English

400

Miriam@superrzk·8 Haz

The "Schddule Sent" at Gmail

GIF

English

Miriam retweetledi

Oskar van der Wal@oskarvanderwal·24 May

🙌 This is joint work with @BlancheMinerva @MirunaClinciu @manandey @ShayneRedford @SashaMTL @superrzk @mmitchell_ai Dragomir Radev @evolvedeve @arjunsubgraph @jaesungtae @samsontmr @dpacsays led by @ZeerakTalat & @AlmostAisling

English

Miriam retweetledi

Oskar van der Wal@oskarvanderwal·24 May

🔨 Our recommendations to foster fairness of LLMs: 1) Transparent bias evaluations via scoping and documentation 2) Diversity of tested stereotypes for increased inclusivity 3) Creation of culturally aware datasets 4) General bias measures that can compare different model setups

English

Miriam retweetledi

Oskar van der Wal@oskarvanderwal·24 May

In this table with 25 very large LMs, we show that LLMs are overwhelmingly trained on English texts and by homogeneous teams located in the USA. Furthermore, most of the LLMs are not evaluated for biases by their original creators.

English

Miriam retweetledi

Oskar van der Wal@oskarvanderwal·24 May

2) Few bias benchmarks cover other languages than English. This exclusive focus on Anglo-centric contexts, hinders the much-needed evaluation of multilingual contexts. Translating the existing benchmarks wouldn't solve the problem, as stereotypes can vary greatly across cultures.

English

Miriam retweetledi

Oskar van der Wal@oskarvanderwal·24 May

Further complicating the bias analysis, it is often difficult to separate the bias measures from the specific LLM setup (eg architecture), complicating the comparison of different setups. How can we compare/validate bias metrics for different contexts and (future) models?

English

Miriam retweetledi

Oskar van der Wal@oskarvanderwal·24 May

As @BigScienceLLM is creating a large multilingual language model, we (the bias, fairness, and social impact WG @BigscienceW) discuss the challenges that we face in evaluating these models for biases in multilingual settings. 🌏🌎🌍

English

Miriam retweetledi

Oskar van der Wal@oskarvanderwal·24 May

I am happy to announce that our position paper "You Reap What You Sow: On the Challenges of Bias Evaluation Under Multi-Lingual Settings" has been accepted for presentation at the @BigscienceW #acl2022 ☘️ workshop! 🧵⬇️ aclanthology.org/2022.bigscienc…

English

Miriam retweetledi

Yacine Jernite@YJernite·20 May

For friends who ask what it is I've actually been doing for the last year+: well lots of this 😛 It's been a unique opportunity to connect with and learn from many amazing interdisciplinary collaborators, stay tuned for a summary thread and come talk to us about it 🤗🌸

MMitchell@mmitchell_ai

The data used in machine learning needs to be open for people to interrogate, while also controlled enough not to proliferate. Introducing our framework for Data Governance, which addresses these issues and more! A product of @BigscienceW at FAccT 2022. yjernite.github.io/content/LangDa…

English

Miriam retweetledi

MMitchell@mmitchell_ai·20 May

English

107

475

Miriam retweetledi

arbml@arabicml2·14 May

شكراً لكل من ساهم في اي من هذه المشاريع @xp187 @mhmoodlan @abdulelahsm @mustafaj0x @abidlabs @55i5 @alonemazin @sudomaze @superrzk

العربية

Miriam retweetledi

arbml@arabicml2·14 May

10. أبحاث : مجتمع مفتوح لمناقشة آخر التطورات في معالجة اللغة العربية GitHub: github.com/ARBML/Research

العربية

Miriam retweetledi

arbml@arabicml2·14 May

9. بيانات: توفر هذه الأداة القدرة على عرض إحصائيات مختلفة من البيانات ، مثل التعرف على أكثر الكلمات المتكررة ، الحروف ، عدد الأسطر ، إلخ GitHub: github.com/ARBML/bayanat Demo: colab.research.google.com/github/ARBML/b…

العربية

Miriam retweetledi

arbml@arabicml2·14 May

8. نماذج : تمكن هذه المكتبة من تدريب عدة نماذج في مجال معالجة اللغة ، واختبارها على عدة بيانات في آن واحد وبسهولة تامة GitHub: github.com/ARBML/nmatheg Demo: colab.research.google.com/github/ARBML/n…

العربية

Miriam retweetledi

arbml@arabicml2·14 May

7. رسم : إنشاء العديد من الصور في مجال الخطوط العربية ، والرسوم الإسلامية بإستخدام تقنيات GANs GitHub: github.com/ARBML/rasm Demo: colab.research.google.com/github/ARBML/r…

العربية

Miriam retweetledi

arbml@arabicml2·14 May

6. تنقيح : هي مكتبة لتنظيف البيانات تحتوي على عدة أدوات للتعامل مع التشكيل، والحروف الإنجليزية ، وتنظيف بيانات وسائل التواصل مثل تويتر ، الخ GitHub: github.com/ARBML/tnkeeh

العربية

Miriam retweetledi

arbml@arabicml2·14 May

5. تقسيم : هي عبارة عن أداة لتقسيم النص العربي tokenization Paper: arxiv.org/pdf/2106.07540… GitHub: github.com/ARBML/tkseem Demo: colab.research.google.com/github/ARBML/t…

Keşfet

@BigscienceW @mtlaiethics @BlancheMinerva @MirunaClinciu @manandey @ShayneRedford @SashaMTL @mmitchell_ai