
we show for the first time ever that sub-billion audio models can reason. we introduce mellow, a small audio-language model (167M) that gets SoTA on different audio reasoning tasks. by using our method and data, you can train an alm within 24 hrs on academic resources (1/n 🧵)














