

Abdul Basit Banbhan
243 posts

@abbanbhan
AI Engineer @ https://t.co/wsJ0Kw7VNF | Ambassador @cursor_ai | Lecturer @jkulinz | ex-@ASMLcompany | ex-@iaeaorg | BSc & MSc in AI @jkulinz | ELP Fellow @austrianstartup




I’m excited to announce that I have joined JKU Linz as a Full Professor, where I founded the Institute for Machine Intelligence! 🇦🇹🤖 🚀 Our mission is to focus on the role of embodiment in robot learning: we develop learning methods, design robots, and explore their interplay to tackle the toughest robotics challenges. 🤝Join our journey! We have several PhD positions and a postdoc position available 👇 Leaving the @UAlberta is bittersweet. To my friends and colleagues at the @UAlbertaCS and @AmiiThinks: you have truly felt like family. I am deeply grateful for your unwavering support, the incredible journey we shared, and for providing such a wonderful academic home. I also want to sincerely thank @CIFAR_News for their support throughout this chapter.




We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at #ICASSP2026! 🐢 Blog pub.sakana.ai/kame/ Paper arxiv.org/abs/2510.02327 Can a speech AI think deeply without pausing to process? In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds. Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to "think, then speak." In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese). A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as "oracle" signals in real time. This shifts the AI paradigm from "think, then speak" to "speak while thinking." The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions. Try the model yourself here: huggingface.co/SakanaAI/kame







