Benjamin Elizalde

50 posts

Benjamin Elizalde

Benjamin Elizalde

@benjaminelizal

Researcher; Traveler

Katılım Eylül 2010
69 Takip Edilen77 Takipçiler
Benjamin Elizalde retweetledi
Annamaria Mesaros
Annamaria Mesaros@AnnamariaMsros·
100 fully-funded PhD positions in Finland! I am one of the potential supervisors for PhDs in this new doctoral programme, come do research with us! Details on the application process and selection procedure -> fcai.fi/doctoral-progr… Deadline 2nd of April.
Finnish Center for AI 🦣 @[email protected]@FCAI_fi

Applications are open for the doctoral program in AI! 🎓 100 fully funded PhD positions across 10 Finnish universities 🧭 Collaboration with industry Professors @arnosolin and @lauraruotsa introduce the program in this video 👇 Apply here by April 2 ➡️ fcai.fi/doctoral-progr…

English
1
9
14
1.4K
Benjamin Elizalde retweetledi
Tuomas Virtanen
Tuomas Virtanen@TuomasVirt·
10 PhD positions available in the Bioacoustic AI network, including one position in Audio Research Group of @TampereUni. Apply to the TAU position here: tuni.rekrytointi.com/paikat/?o=A_RJ…
@[email protected]@mclduk

Announcing fully-funded PhD positions on our new "Bioacoustic AI" project: bioacousticai.eu Apply now for a #PhD studying animal sounds (#bioacoustics), #deeplearning, #acoustic signals, and #ecology! 2 open to apply now, 8 more coming soon. (Pls share)

English
0
4
10
1.2K
Benjamin Elizalde retweetledi
arXiv Sound
arXiv Sound@ArxivSound·
``PAM: Prompting Audio-Language Models for Audio Quality Assessment,'' Soham Deshmukh Dareen Alharthi Benjamin Elizalde Hannes Gamper Mahmoud Al Ismail Rita Singh Bhiksha Raj Huaming Wang … ift.tt/6HIcVpn
Català
0
6
19
1.8K
Benjamin Elizalde retweetledi
Joan Serrà
Joan Serrà@serrjoa·
Paper proposing to leverage audio-language models to measure audio quality in multiple tasks (TTA, TTM, speech denoising, etc.), in a reference-free way. One of the tricks is to compare against multiple *and opposite* quality-related text prompts. arxiv.org/abs/2402.00282
Joan Serrà tweet media
English
1
8
48
5.9K
Benjamin Elizalde retweetledi
Soham Deshmukh
Soham Deshmukh@sohamdesh_·
This is all thanks to the amazing collaborators at Microsoft and CMU! @benjaminelizal, @Ydhira, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang (5/5)
English
0
1
1
342
Benjamin Elizalde retweetledi
Soham Deshmukh
Soham Deshmukh@sohamdesh_·
Prompting Audios using Acoustic Properties for Emotion Representation. Existing audio-text models are bad at speech tasks like Speech Emotion Recognition. We provide different ways to improve this performance and introduce the Emotion Audio Retrieval task (4/5)
English
1
1
1
362
Benjamin Elizalde retweetledi
Soham Deshmukh
Soham Deshmukh@sohamdesh_·
Training Audio Captioning Models without Audio. We show how to train audio captioning models with only text and further improve performance with LLM-generated text. The paper also introduces stylized audio captioning. The data is released here: github.com/microsoft/NoAu… (3/5)
English
1
1
1
335
Benjamin Elizalde retweetledi
Soham Deshmukh
Soham Deshmukh@sohamdesh_·
Natural Language Supervision for General-Purpose Audio Representations. We make improvements to the CLAP model and scale it to 4.6M audio-text pairs. The model is publicly available at: github.com/microsoft/CLAP (2/5)
English
1
2
11
752
Benjamin Elizalde retweetledi
Soham Deshmukh
Soham Deshmukh@sohamdesh_·
Do you work with audio-text models? If so, we have three relevant papers appearing at ICASSP 2024 #ICASSP2024 (1/5)
English
1
1
19
3.3K
Benjamin Elizalde retweetledi
Mirco Ravanelli
Mirco Ravanelli@mirco_ravanelli·
🎉 Exciting News! 🎙️ Join us in exploring Explainable AI for speech and audio with our IEEE ICASSP 2024 workshop. 📝 Call for Papers: Two tracks with deadlines - January 20 & February 20. Don't miss out! 🚀 #ICASSP2024 #ExplainableAI #SpeechAndAudio
Cem Subakan@CemSubakan

New workshop announcement 📢📢 We are excited to announce our IEEE ICASSP 2024 workshop, Explainable AI for Speech and Audio! We have two tracks for papers with deadlines January 20, and February 20. The details on how to submit: xai-sa-workshop.github.io/web/Call%20for…

English
0
8
34
4.8K
Benjamin Elizalde retweetledi
AK
AK@_akhaliq·
Natural Language Supervision for General-Purpose Audio Representations paper page: huggingface.co/papers/2309.05… Audio-Language models jointly learn multimodal text and audio representations that enable Zero-Shot inference. Models rely on the encoders to create powerful representations of the input and generalize to multiple tasks ranging from sounds, music, and speech. Although models have achieved remarkable performance, there is still a performance gap with task-specific models. In this paper, we propose a Contrastive Language-Audio Pretraining model that is pretrained with a diverse collection of 4.6M audio-text pairs employing two innovative encoders for Zero-Shot inference. To learn audio representations, we trained an audio encoder on 22 audio tasks, instead of the standard training of sound event classification. To learn language representations, we trained an autoregressive decoder-only model instead of the standard encoder-only models. Then, the audio and language representations are brought into a joint multimodal space using Contrastive Learning. We used our encoders to improve the downstream performance by a margin. We extensively evaluated the generalization of our representations on 26 downstream tasks, the largest in the literature. Our model achieves state of the art results in several tasks leading the way towards general-purpose audio representations.
AK tweet media
English
1
18
93
19.8K
Benjamin Elizalde retweetledi
Benjamin Elizalde retweetledi
AuditoryLab
AuditoryLab@AuditoryLabInfo·
The call for abstracts is open for the 22nd annual Auditory Perception, Cognition, and Action Meeting (APCAM) on Nov 16th in San Francisco, CA. Deadline 09/17/23 for abstract submission and travel award applications. Visit apcsociety.org . @acousticsorg @AuditoryAPCAM
English
0
1
1
100
Benjamin Elizalde retweetledi
Benjamin Elizalde retweetledi
AK
AK@_akhaliq·
Pengi: An Audio Language Model for Audio Tasks introduce Pengi, a novel Audio Language Model that leverages Transfer Learning by framing all audio tasks as text-generation tasks. It takes as input, an audio recording, and text, and generates free-form text as output. The input audio is represented as a sequence of continuous embeddings by an audio encoder. A text encoder does the same for the corresponding text input. Both sequences are combined as a prefix to prompt a pre-trained frozen language model. The unified architecture of Pengi enables open-ended tasks and close-ended tasks without any additional fine-tuning or task-specific extensions. When evaluated on 22 downstream tasks, our approach yields state-of-the-art performance in several of them paper page: huggingface.co/papers/2305.11…
AK tweet media
English
2
62
212
77.4K
Benjamin Elizalde retweetledi
Hao-Wen (Herman) Dong 董皓文
Hao-Wen (Herman) Dong 董皓文@hermanhwdong·
Checking ICLR final decisions, it made me wonder why audio research is treated so poorly in ML conferences? 🥲 I did a simple keyword search on OpenReivew and here is what I found! 👉 @iclr_conf #ICLR2023
Hao-Wen (Herman) Dong 董皓文 tweet media
English
4
35
193
48.2K