
Introducing Human-1 by @JoshTalksLive - the first open full-duplex conversational speech model for Hindi. Key result: natural conversational behaviour transfers to a new language when a duplex speech model is trained on sufficiently large-scale real dialogue. Built on Moshi's full-duplex architecture, Human-1 listens and speaks simultaneously processing both speakers' audio and Hindi text in parallel. Training data: 26,000 hours of real two-person conversations 14,695 speakers stereo recordings This allows the model to directly learn conversational dynamics most speech datasets lack: • interruptions • overlap • backchannels • turn-taking Interestingly, the model converged in ~4,000 training steps, suggesting that the model quickly learned conversational patterns from the data and that much larger conversational datasets may unlock further capability gains. Human evaluation: 85% rated as human-like interaction Naturalness MOS 4.10 (human: 4.55) 66.9% tie with human speech Human-1 is not a production system - it’s a research proof point toward much larger conversational speech corpora. Paper, code, playground and demos: ai.joshtalks.com/research/human…

















