So Kuroki

102 posts

So Kuroki banner
So Kuroki

So Kuroki

@sharp_computer

Researcher @SakanaAILabs. LLM, Audio, and Robotics.

Katılım Ağustos 2016
674 Takip Edilen641 Takipçiler
So Kuroki
So Kuroki@sharp_computer·
Several people asked about the compute required for this research. Training takes about half a day on 2-3 GPUs. Ofc we used more compute during research and experimentation, but each iteration itself is lightweight. If you can SFT a 7B model (like Moshi), you can try it too.
English
0
1
6
372
So Kuroki
So Kuroki@sharp_computer·
KAME🐢 will be presented at tomorrow’s final session! Please stop by before you leave. 5/8 2:00–4:00 PM Poster Area 30.4 #ICASSP
Sakana AI@SakanaAILabs

We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at #ICASSP2026! 🐢 Blog pub.sakana.ai/kame/ Paper arxiv.org/abs/2510.02327 Can a speech AI think deeply without pausing to process? In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds. Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to "think, then speak." In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese). A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as "oracle" signals in real time. This shifts the AI paradigm from "think, then speak" to "speak while thinking." The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions. Try the model yourself here: huggingface.co/SakanaAI/kame

English
2
4
41
6.2K
So Kuroki retweetledi
Yotaro
Yotaro@yotarokubo·
#ICASSP 我々のKAME🐢論文 (SLP-P56.4)は、本日の現地時間14:00からPoster Area 30で発表されます。学習データの構成法からモデル学習の実際まで、なんでも聞いてください。私も発表者の近くにいるハズなので、よろしくお願いします。一応ヘッドホン持って行きますが、スペース十分にあるかな?
日本語
0
4
25
1.7K
So Kuroki retweetledi
hardmaru
hardmaru@hardmaru·
For the past few years, humans have been doing “prompt engineering” to coax the best performance out of different LLMs. In this work, we explored what happens if we train an AI to do that job instead. By training a Conductor model with RL, we found that it naturally learns to write highly effective, custom instructions for a whole pool of other models. It essentially learns to ‘manage’ them in natural language. What surprised me most was how it dynamically adapts. For simple factual questions, it just queries one model. But for hard coding problems, it autonomously spins up a whole pipeline of planners, coders, and verifiers. Really excited to see where this paradigm of “AI managing AI” goes next, especially as we start moving from single-agent chain-of-thought to multi-agent “chain-of-command”. Link to our #ICLR2026 paper: arxiv.org/abs/2512.04388 Along with our TRINITY paper which we announced earlier, this work also powers our new multi-agent system: Sakana Fugu (sakana.ai/fugu-beta) 🐡
Sakana AI@SakanaAILabs

Introducing our new work: “Learning to Orchestrate Agents in Natural Language with the Conductor” accepted at #ICLR2026 arxiv.org/abs/2512.04388 What if we trained an AI not to solve problems directly, but to act as a manager that delegates tasks to a diverse team of other AIs? To solve complex tasks, humans rarely work alone; we form teams, delegate, and communicate. Yet, multi-agent AI systems currently rely heavily on rigid, human-designed workflows or simple routers that just pick a single model. We wanted an AI that could dynamically build its own team. We trained a 7B Conductor model using Reinforcement Learning to orchestrate a pool of frontier models (including GPT-5, Gemini, Claude, and open-source models available during the period leading up to ICLR 2026). Instead of executing code, the Conductor outputs a collaborative workflow in natural language. For any given question, the Conductor specifies: 1/ Which agent to call 2/ What specific subtask to give them (acting as an expert prompt engineer) 3/ What previous messages they can see in their context window Through pure end-to-end reward maximization, amazing behaviors emerged. The Conductor learned to adapt to task difficulty: it 1-shots simple factual questions, but autonomously spins up complex planner-executor-verifier pipelines for hard coding problems. The results are very promising: The 7B Conductor surpasses the performance of every individual worker model in its pool, setting new records on LiveCodeBench (83.9%) and GPQA-Diamond (87.5%) at the time of publication. It also significantly outperforms expensive multi-agent baselines like Mixture-of-Agents at a fraction of the cost. One of our favorite features: Recursive Test-Time Scaling! By allowing the Conductor to select itself as a worker, it reads its own team's prior output, realizes if it failed, and spins up a corrective workflow on the fly. This opens a new axis for scaling compute during inference. This research proves that language models can become elite meta-prompt engineers, dynamically harnessing collective intelligence. Alongside our TRINITY research which we announced a few days earlier, this foundational research powers our new multi-agent system: Sakana Fugu! (sakana.ai/fugu-beta) 🐡 OpenReview: openreview.net/forum?id=U23A2… (ICLR 2026)

English
40
175
1.4K
182.6K
So Kuroki retweetledi
takkyu
takkyu@takkyuO2·
#ICML2026 に論文がacceptされました!今度は拡散言語モデルのtest-time scalingの研究です。 複数の拡散言語モデルに協力させるとコーディングや数学の能力が大きく上げられるという研究で、性能も良いしアルゴリズム自体も面白くてお気に入りの研究です。韓国でお会いしましょう🇰🇷
takkyu tweet media
日本語
1
3
43
2.7K
So Kuroki retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at #ICASSP2026! 🐢 Blog pub.sakana.ai/kame/ Paper arxiv.org/abs/2510.02327 Can a speech AI think deeply without pausing to process? In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds. Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to "think, then speak." In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese). A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as "oracle" signals in real time. This shifts the AI paradigm from "think, then speak" to "speak while thinking." The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions. Try the model yourself here: huggingface.co/SakanaAI/kame
English
14
145
741
289.3K
So Kuroki retweetledi
逆瀬川
逆瀬川@gyakuse·
kameのweightとブログ記事公開だ!!サカーナいつもありがとう Cascade な full-duplex 大好きマンだけどメチャやりたくなる huggingface.co/SakanaAI/kame
Sakana AI@SakanaAILabs

We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at #ICASSP2026! 🐢 Blog pub.sakana.ai/kame/ Paper arxiv.org/abs/2510.02327 Can a speech AI think deeply without pausing to process? In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds. Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to "think, then speak." In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese). A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as "oracle" signals in real time. This shifts the AI paradigm from "think, then speak" to "speak while thinking." The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions. Try the model yourself here: huggingface.co/SakanaAI/kame

日本語
0
4
39
7.1K
So Kuroki retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
音声AIの素早さと賢さを両立できるか? 私たち人間は会話の中で、言いたいことを全部まとめてから話し始めるのではなく、話しながら考えを整理していきます。応答の速い Speech-to-Speech モデルは、この「話しながら考える」を実現しましたが、そのぶん思考が浅くなりがちです。かといって知識豊富な LLM を挟むカスケード型では、遅延が生じるため「話しながら」が成立しません。 そこで Sakana AI は、このトレードオフを克服するKAMEモデルを開発しました。Speech-to-Speech モデルが高速な応答ループを担当し、即座に話し始めます。その裏でバックエンドの LLM が非同期に推論を進めて応答候補を生成し、それをオラクル信号としてリアルタイムに注入します。これにより「考えてから話す」ではなく「話しながら考える」ことが可能になります。 バックエンドの LLM は差し替えが可能で、タスクに応じてGPT-4.1、Claude Opus、Gemini 2.5 Flashなどを使い分けられます。フロントエンド側の変更は必要ありません。私たちの実験では、Claudeは推論系のタスクで、GPTは人文系のタスクで、それぞれ高いスコアを出す傾向が見られました。 本研究は #ICASSP2026 で発表されます。 ぜひ、お試しください。 ブログ: pub.sakana.ai/kame/ 論文: arxiv.org/abs/2510.02327 モデル: huggingface.co/SakanaAI/kame
Sakana AI@SakanaAILabs

We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at #ICASSP2026! 🐢 Blog pub.sakana.ai/kame/ Paper arxiv.org/abs/2510.02327 Can a speech AI think deeply without pausing to process? In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds. Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to "think, then speak." In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese). A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as "oracle" signals in real time. This shifts the AI paradigm from "think, then speak" to "speak while thinking." The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions. Try the model yourself here: huggingface.co/SakanaAI/kame

日本語
6
106
525
80.1K
So Kuroki retweetledi
Yotaro
Yotaro@yotarokubo·
🐢の紹介です。重みとコードが公開されます。是非お試ししてみてください!
Sakana AI@SakanaAILabs

We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at #ICASSP2026! 🐢 Blog pub.sakana.ai/kame/ Paper arxiv.org/abs/2510.02327 Can a speech AI think deeply without pausing to process? In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds. Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to "think, then speak." In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese). A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as "oracle" signals in real time. This shifts the AI paradigm from "think, then speak" to "speak while thinking." The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions. Try the model yourself here: huggingface.co/SakanaAI/kame

日本語
1
9
49
13.5K
So Kuroki
So Kuroki@sharp_computer·
音声対話モデルの研究をしていました!既存のspeech-to-speech (Moshi)と話した時に、その応答の自然さに感動する一方で、もう少しだけ賢くしたいと思ったのが研究のきっかけです。 5月のICASSPで発表します。モデル、コードも公開しているのでぜひ試してみてください!
Sakana AI@SakanaAILabs

We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at #ICASSP2026! 🐢 Blog pub.sakana.ai/kame/ Paper arxiv.org/abs/2510.02327 Can a speech AI think deeply without pausing to process? In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds. Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to "think, then speak." In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese). A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as "oracle" signals in real time. This shifts the AI paradigm from "think, then speak" to "speak while thinking." The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions. Try the model yourself here: huggingface.co/SakanaAI/kame

日本語
0
16
117
14.5K
So Kuroki retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
We’re launching the beta for our new commercial AI product: Sakana Fugu 🐡, a multi-agent orchestration system! Blog: sakana.ai/fugu-beta Fugu hits SOTA on SWE-Pro, GPQA-D, and ALE-Bench, and has been our internal secret weapon. It dynamically coordinates frontier models, autonomously selecting the optimal agent combinations and roles for each task. Available as an OpenAI-compatible API, you can seamlessly integrate Fugu into your existing workflows with minimal changes. 🐟 Fugu Mini: High-speed orchestration optimized for latency 🐡 Fugu Ultra: Full model pool utilization for deep, complex reasoning Apply for the beta test here: forms.gle/BtKkhc2CfLKk1d…
Sakana AI tweet media
English
28
162
706
364.5K
So Kuroki retweetledi
Yotaro
Yotaro@yotarokubo·
音声系のインターンを募集しています。音声インターフェースを使ったアプリケーションを作ることに興味のある人、誰か一緒に働きませんか?DMください。
日本語
0
17
32
4.4K
So Kuroki retweetledi
小泉進次郎
小泉進次郎@shinjirokoiz·
サカナAI @SakanaAILabs の伊藤社長と意見交換。防衛大臣直轄の吉田AIチーム長ら職員も参加して非常に有意義な時間になりました。ありがとうございました!
小泉進次郎 tweet media小泉進次郎 tweet media
日本語
0
127
1.2K
87.2K
So Kuroki retweetledi
Hikaru Asano
Hikaru Asano@hikaru_asan0·
Joined Sakana AI as a Research Intern 🐟 Super excited 🔥
English
8
8
146
17.3K
So Kuroki retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
🐟Ultra Deep Researchアシスタント「Sakana Marlin」、βテスター募集🐟 Sakana AIは、当社初の商用プロダクトとして、独自のエージェント技術によるビジネス向けAIリサーチアシスタント「Sakana Marlin」を開発しました。 sakana.ai/marlin-beta Sakana Marlinは、高度なビジネス調査を完遂する 、独自の長期推論技術に基づく自律型リサーチアシスタントです。 主な特徴 ・ テーマを与えると、8時間近くにわたり自律的にリサーチ ・ 詳細な調査ドキュメントとまとめスライドを自動生成 ・ 複数人のチームが数週間かけるプロフェッショナルな戦略調査を想定 複雑な社会情勢の中で良質な判断を下すため、AIのポテンシャルを最大限生かすソリューションとして構想しました。 本技術は、先日Nature誌にも掲載された科学的発見の自動化「AIサイエンティスト」の知見と、戦略的探索を可能にする「AB-MCTS」を融合。長く考えた分だけアウトプットの質が向上する「効率的な推論スケーリング」を実現しています。 クローズドβテストを実施します 金融機関・事業会社の経営戦略/事業企画部門、コンサルファーム、シンクタンクなど、日常的に高度なリサーチに取り組む方が対象です(期間中無料)。皆様からのフィードバックをもとに改善を重ねていきます。 ▼ クローズドβテスター応募はこちら forms.gle/MYHGP1wi2q4PHY…
Sakana AI tweet media
日本語
13
106
448
279K
So Kuroki retweetledi
Takuya Akiba
Takuya Akiba@iwiwi·
Sakana Chatの公開です! 今回開発した「Namazu」モデルは、DeepSeek-V3.1等のオープンLLMに事後学習を適用したものです。優れた性能を維持しながら、日本での利用に適した振る舞いをします。Web検索機能についてもよく作り込んでいるので、日常用途には十分実用的だと思います。是非お試し下さい。
Sakana AI@SakanaAILabs

🐟 Sakana Chat 公開 🐟 Sakana AIは、Sakana Chatを無料公開しました。 chat.sakana.ai Web検索機能と高速レスポンスを備えたAIチャットです。日本国内から、どなたでもお使いいただけます。ぜひ、お試しください。

日本語
16
218
879
301.5K
So Kuroki retweetledi
hardmaru
hardmaru@hardmaru·
Sakana AI 初の一般向けサービス Sakana Chat を公開しました🐟 強力なWeb検索エージェントを備え、高速で信頼性の高い情報を引き出せます。 世界の高性能なオープンモデルには、開発元のバイアスが不可避的に内在しています。我々は独自の事後学習により、①これらのバイアスの除去、②日本の価値観 の反映、③安全かつ文脈に即した適応を実現する技術を開発しました。 今回のリリースは、その技術実証の第一弾。国内で誰もが安心して使えるAIの選択肢の一つとして、ぜひお試しください!
Sakana AI@SakanaAILabs

🐟 Sakana Chat 公開 🐟 Sakana AIは、Sakana Chatを無料公開しました。 chat.sakana.ai Web検索機能と高速レスポンスを備えたAIチャットです。日本国内から、どなたでもお使いいただけます。ぜひ、お試しください。

日本語
19
188
866
292K
So Kuroki retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
🐟 Sakana Chat 公開 🐟 Sakana AIは、Sakana Chatを無料公開しました。 chat.sakana.ai Web検索機能と高速レスポンスを備えたAIチャットです。日本国内から、どなたでもお使いいただけます。ぜひ、お試しください。
GIF
日本語
111
1.6K
6K
1.7M
So Kuroki retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
Sakana AIは、防衛装備庁 防衛イノベーション科学技術研究所より「複数AI技術の組み合わせによる観測・報告・情報統合・資源配分 高速化の研究」を受託しました。 sakana.ai/atla-contract-… 本研究では、当社の強みである「小規模視覚言語モデル(SVLM)」や自律型AIエージェント技術を活用し、ドローン 等のエッジデバイスから得られる膨大なデータの分析・統合、そして最適な意思決定に至るプロセスを一気通貫で高速化するシステムの構築を目指します。 安全保障領域における「情報力」の重要性が高まる中、日本発のAI企業として技術的自律性を確保し、最先端の研究成果を日本の安全保障の基盤強化へと実装してまいります。
Sakana AI tweet media
日本語
20
235
830
433.6K