Benjamin Elizalde (@benjaminelizal) - Twitter Profili

Benjamin Elizalde retweetledi

Anurag Kumar@AcouIntel·20 Eyl

We have started hiring research interns for 2025. I am looking for a PhD student with background in multimodal generation/understanding. metacareers.com/jobs/374372872…. Flexible with the timing but summers are usually the best :) #internship2025 #multimodal #audio #speech @AIatMeta

English

5

45

217

25.8K

Benjamin Elizalde retweetledi

Annamaria Mesaros@AnnamariaMsros·13 Mar

100 fully-funded PhD positions in Finland! I am one of the potential supervisors for PhDs in this new doctoral programme, come do research with us! Details on the application process and selection procedure -> fcai.fi/doctoral-progr… Deadline 2nd of April.

Finnish Center for AI 🦣 @[email protected]@FCAI_fi

Applications are open for the doctoral program in AI! 🎓 100 fully funded PhD positions across 10 Finnish universities 🧭 Collaboration with industry Professors @arnosolin and @lauraruotsa introduce the program in this video 👇 Apply here by April 2 ➡️ fcai.fi/doctoral-progr…

English

1

9

14

1.4K

Benjamin Elizalde retweetledi

Tuomas Virtanen@TuomasVirt·9 Eyl

10 PhD positions available in the Bioacoustic AI network, including one position in Audio Research Group of @TampereUni. Apply to the TAU position here: tuni.rekrytointi.com/paikat/?o=A_RJ…

@[email protected]@mclduk

Announcing fully-funded PhD positions on our new "Bioacoustic AI" project: bioacousticai.eu Apply now for a #PhD studying animal sounds (#bioacoustics), #deeplearning, #acoustic signals, and #ecology! 2 open to apply now, 8 more coming soon. (Pls share)

English

0

4

10

1.2K

Benjamin Elizalde retweetledi

arXiv Sound@ArxivSound·2 Şub

``PAM: Prompting Audio-Language Models for Audio Quality Assessment,'' Soham Deshmukh Dareen Alharthi Benjamin Elizalde Hannes Gamper Mahmoud Al Ismail Rita Singh Bhiksha Raj Huaming Wang … ift.tt/6HIcVpn

Català

0

6

19

1.8K

Benjamin Elizalde retweetledi

Joan Serrà@serrjoa·2 Şub

Paper proposing to leverage audio-language models to measure audio quality in multiple tasks (TTA, TTM, speech denoising, etc.), in a reference-free way. One of the tricks is to compare against multiple *and opposite* quality-related text prompts. arxiv.org/abs/2402.00282

English

1

8

48

5.9K

Benjamin Elizalde retweetledi

Soham Deshmukh@sohamdesh_·14 Ara

This is all thanks to the amazing collaborators at Microsoft and CMU! @benjaminelizal, @Ydhira, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang (5/5)

English

0

1

342

Benjamin Elizalde retweetledi

Soham Deshmukh@sohamdesh_·14 Ara

Prompting Audios using Acoustic Properties for Emotion Representation. Existing audio-text models are bad at speech tasks like Speech Emotion Recognition. We provide different ways to improve this performance and introduce the Emotion Audio Retrieval task (4/5)

English

1

362

Benjamin Elizalde retweetledi

Soham Deshmukh@sohamdesh_·14 Ara

Training Audio Captioning Models without Audio. We show how to train audio captioning models with only text and further improve performance with LLM-generated text. The paper also introduces stylized audio captioning. The data is released here: github.com/microsoft/NoAu… (3/5)

English

1

335

Benjamin Elizalde retweetledi

Soham Deshmukh@sohamdesh_·14 Ara

Natural Language Supervision for General-Purpose Audio Representations. We make improvements to the CLAP model and scale it to 4.6M audio-text pairs. The model is publicly available at: github.com/microsoft/CLAP (2/5)

English

1

2

11

752

Benjamin Elizalde retweetledi

Soham Deshmukh@sohamdesh_·14 Ara

Do you work with audio-text models? If so, we have three relevant papers appearing at ICASSP 2024 #ICASSP2024 (1/5)

English

1

19

3.3K

Benjamin Elizalde retweetledi

Mirco Ravanelli@mirco_ravanelli·23 Eki

🎉 Exciting News! 🎙️ Join us in exploring Explainable AI for speech and audio with our IEEE ICASSP 2024 workshop. 📝 Call for Papers: Two tracks with deadlines - January 20 & February 20. Don't miss out! 🚀 #ICASSP2024 #ExplainableAI #SpeechAndAudio

Cem Subakan@CemSubakan

New workshop announcement 📢📢 We are excited to announce our IEEE ICASSP 2024 workshop, Explainable AI for Speech and Audio! We have two tracks for papers with deadlines January 20, and February 20. The details on how to submit: xai-sa-workshop.github.io/web/Call%20for…

English

0

8

34

4.8K

Benjamin Elizalde@benjaminelizal·27 Eyl

@_akhaliq The code for our new 2023 👏CLAP model is out! github.com/microsoft/CLAP

English

0

3

35

Benjamin Elizalde retweetledi

AK@_akhaliq·13 Eyl

Natural Language Supervision for General-Purpose Audio Representations paper page: huggingface.co/papers/2309.05… Audio-Language models jointly learn multimodal text and audio representations that enable Zero-Shot inference. Models rely on the encoders to create powerful representations of the input and generalize to multiple tasks ranging from sounds, music, and speech. Although models have achieved remarkable performance, there is still a performance gap with task-specific models. In this paper, we propose a Contrastive Language-Audio Pretraining model that is pretrained with a diverse collection of 4.6M audio-text pairs employing two innovative encoders for Zero-Shot inference. To learn audio representations, we trained an audio encoder on 22 audio tasks, instead of the standard training of sound event classification. To learn language representations, we trained an autoregressive decoder-only model instead of the standard encoder-only models. Then, the audio and language representations are brought into a joint multimodal space using Contrastive Learning. We used our encoders to improve the downstream performance by a margin. We extensively evaluated the generalization of our representations on 26 downstream tasks, the largest in the literature. Our model achieves state of the art results in several tasks leading the way towards general-purpose audio representations.

English

1

18

93

19.8K

Benjamin Elizalde retweetledi

Soham Deshmukh@sohamdesh_·22 Eyl

The paper "Pengi: An Audio Language Model for Audio Tasks" is accepted at #NeurIPS2023! 🥳

AK@_akhaliq

Pengi: An Audio Language Model for Audio Tasks introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks. It takes as input, an audio recording, and text, and generates free-form text as output. The input audio is represented as a sequence of continuous embeddings by an audio encoder. A text encoder does the same for the corresponding text input. Both sequences are combined as a prefix to prompt a pre-trained frozen language model. The unified architecture of Pengi enables open-ended tasks and close-ended tasks without any additional fine-tuning or task-specific extensions. When evaluated on 22 downstream tasks, our approach yields state-of-the-art performance in several of them paper page: huggingface.co/papers/2305.11…

English

1

3

32

3.9K

Benjamin Elizalde retweetledi

AuditoryLab@AuditoryLabInfo·2 Ağu

The call for abstracts is open for the 22nd annual Auditory Perception, Cognition, and Action Meeting (APCAM) on Nov 16th in San Francisco, CA. Deadline 09/17/23 for abstract submission and travel award applications. Visit apcsociety.org . @acousticsorg @AuditoryAPCAM

English

0

1

100

Benjamin Elizalde retweetledi

Qiuqiang Kong@QiuqiangK·23 May

Two good papers on LLM-based audio understanding: Listen, Think, and Understand: arxiv.org/abs/2305.10790 Pengi: An Audio Language Model for Audio Tasks: arxiv.org/pdf/2305.11834…

English

2

17

83

6.2K

Benjamin Elizalde retweetledi

mΞoш@meowbooksj·22 May

AK@_akhaliq

Pengi: An Audio Language Model for Audio Tasks introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks. It takes as input, an audio recording, and text, and generates free-form text as output. The input audio is represented as a sequence of continuous embeddings by an audio encoder. A text encoder does the same for the corresponding text input. Both sequences are combined as a prefix to prompt a pre-trained frozen language model. The unified architecture of Pengi enables open-ended tasks and close-ended tasks without any additional fine-tuning or task-specific extensions. When evaluated on 22 downstream tasks, our approach yields state-of-the-art performance in several of them paper page: huggingface.co/papers/2305.11…

ZXX

2

4

40

24.5K

Benjamin Elizalde retweetledi

AK@_akhaliq·22 May

Pengi: An Audio Language Model for Audio Tasks introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks. It takes as input, an audio recording, and text, and generates free-form text as output. The input audio is represented as a sequence of continuous embeddings by an audio encoder. A text encoder does the same for the corresponding text input. Both sequences are combined as a prefix to prompt a pre-trained frozen language model. The unified architecture of Pengi enables open-ended tasks and close-ended tasks without any additional fine-tuning or task-specific extensions. When evaluated on 22 downstream tasks, our approach yields state-of-the-art performance in several of them paper page: huggingface.co/papers/2305.11…

English

2

62

212

77.4K

Benjamin Elizalde retweetledi

Hao-Wen (Herman) Dong 董皓文@hermanhwdong·18 Şub

Checking ICLR final decisions, it made me wonder why audio research is treated so poorly in ML conferences? 🥲 I did a simple keyword search on OpenReivew and here is what I found! 👉 @iclr_conf #ICLR2023