Haiyue Song

599 posts

Haiyue Song

@shyyhs

Working on LLM Training @PreferredNetJP SJTU→KyotoU→NICT→PFN. DC1 若手研究. Ski ⛷️

Nagoya Katılım Mart 2016

334 Takip Edilen375 Takipçiler

Sabitlenmiş Tweet

Haiyue Song@shyyhs·5d

🤯 Struggling with dataset mixing ratios in LLM continual training? 🧩 We propose OptiMer: train one model per dataset, then merge them optimally. No more costly ratio tuning! 📄OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training 🔗arxiv.org/abs/2603.28858 My last work at @NICT_Publicity also related to our collaboration with @AISingapore 🧵 1/9 #NLProc #NLP #LLM #ModelMerging #大規模言語モデル #AI

English

2.8K

Haiyue Song@shyyhs·9h

先週末名古屋の五条川桜並木🌸 緑の葉っぱが出てきて、ラストチャンスだったみたい京都から名古屋大学近くに引っ越してちょうど一週間必要なもの全部揃ってるし、花見も全然混んでなくてゆったり、ちょうどいいサイズ感で住みやすい

日本語

170

Haiyue Song@shyyhs·19h

@RuoyuSun_UI Thank you for the reply! The diversity drop may also indicates the model concerntrating on high-quality outputs. Good to hear it did not went too far from GRPO.

English

Ruoyu Sun@RuoyuSun_UI·19h

@shyyhs we observed the output diversity dropped a bit, but not too far away from GRPO

English

132

Ruoyu Sun@RuoyuSun_UI·1d

We’re excited to share our work "A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning". An earlier version of this work has been on arXiv for a few months. We added more experiments and revised it to this new title. The recipe is simple: the model samples its own reponses at low temperature, learns from them with ordinary SFT training, and repeats. No reward. No verifier. No fancy objective beyond standard SFT. On Qwen2.5-Math-7B, mean Pass@1 over 6 math benchmarks improves 22.7 → 39.5. Note that mean Pass@32 also improves 61.0 → 67.9, suggesting that this simple reward-free procedure unlocks more of the model’s existing reasoning potential. See the updated paper directly at: github.com/ElementQi/SePT… The arXiv link is: arxiv.org/abs/2510.18814 The updated version will appear on arXiv shortly. @Phanron_xli

English

9.1K

Haiyue Song@shyyhs·22h

@MPEG31 🙌 ありがとうございます！！

日本語

MPEG@MPEG31·1d

@shyyhs うおお！おめでとうございます！！！

日本語

Haiyue Song@shyyhs·3d

報告だいぶ遅くなりましたが 4月1日にプリファードネットワークスに入社しました。大規模言語モデルを作ります💪 名刺には博士(理学)とありますが、実は博士(情報学)です

日本語

9.8K

Haiyue Song@shyyhs·1d

LLMの継続事前学習、データ混合比率の調整に毎回苦労していませんか？😩 先日arXivに投稿した論文「OptiMer」で解決策を提案しました！ ✅ データセットごとに独立学習 ✅ 分布ベクトルを抽出 ✅ 事後的にBayesian最適マージ → データ混合より性能向上＋探索コストを15-35倍削減 Gemma-3-27Bで16ベンチマークで検証✨ 論文📄：arxiv.org/abs/2603.28858 #LLM #継続事前学習 #OptiMer #arXiv #大規模言語モデル

日本語

4.9K

Haiyue Song retweetledi

Goro Kobayashi@goro_koba·1d

I made a diagram of my current understanding of Gemma 4’s Per-Layer Embeddings (PLE). It covers the layer-specific lookup, the input-embedding-derived component, and how the gated injection seems to work. Corrections are very welcome.

English

5.8K

Haiyue Song@shyyhs·1d

@stomohide ありがとうございます！！先生も准教授になっておめでとうございます！！

日本語

Tomohide Shibata@stomohide·1d

@shyyhs おめでとうございます！！

日本語

Haiyue Song@shyyhs·2d

@yo_ehara おそらく企業の規模によって科研費の手続きに対応する余裕があるかどうかも大きいのかな...

日本語

255

Yo Ehara@yo_ehara·2d

ウェザーニューズが科研費受け入れできてPFNが科研費受け入れできないのはなんか不思議な感じ。どういうポストで入ってるのかもにも強く依存するのでケースバイケースだとは思うが。

Haiyue Song@shyyhs

転職に伴い科研費（若手研究）を一年で辞退となりましたが、昨年度この科研費に紐づいた投稿が8本（国際共同研究4件）あったので、許されたい…！🙏

日本語

5.1K

Haiyue Song@shyyhs·2d

日本語

13K

Haiyue Song@shyyhs·2d

@cs_lisheng thanks!!!

English

253

Asst. Prof. Li Sheng (call me /listen/ :)@cs_lisheng·3d

@shyyhs congratulations!!!

English

303

Haiyue Song@shyyhs·3d

@ShengzheLi ありがとうございます！！

日本語

402

LM8@ShengzheLi·3d

@shyyhs おめでとうございます！！！

日本語

423

Haiyue Song@shyyhs·3d

@kodama26985649 つよい！

日本語

Kodama@kodama26985649·3d

LLM-jp-4 を先程公開しました！今回は 8b と 32b-a3b の2モデルの公開です．中間学習，事後学習，評価などを担当しています！事前学習・中間学習・事後学習で使ったデータもライセンス的に問題ないものは全て公開しているので是非！ huggingface.co/collections/ll…

日本語

543

Haiyue Song retweetledi

国立情報学研究所(NII)@jouhouken·3d

✏️ニュースリリース約12兆トークンの良質なコーパスで学習した新たな国産LLM「LLM-jp-4 8Bモデル」「LLM-jp-4 32B-A3Bモデル」をオープンソースライセンスで公開～一部ベンチマークでGPT-4oやQwen3-8Bを上回る性能を達成～ nii.ac.jp/news/release/2… 　大学共同利用機関法人情報・システム研究機構国立情報学研究所大規模言語モデル研究開発センター（LLMC）は、同センターが主宰するLLM研究開発コミュニティ「LLM-jp」の活動の中で大規模言語モデル（LLM）のフルスクラッチ学習を実施し、約86億パラメータの「LLM-jp-4 8Bモデル」と約320億パラメータのMoEモデル「LLM-jp-4 32B-A3Bモデル」をオープンソースライセンスで一般公開しました。公開モデルの学習では、オープンソースAIの定義（OSAID）に配慮し、第三者も入手可能な良質な学習コーパスの収集・選別・構築を行い、インターネット上の公開データや政府・国会の文書、合成データなどからなる約12兆トークンの学習コーパスを整備・使用しました。公開モデルは最大で約6万5千トークンの入出力まで処理でき、言語モデルの日本語理解能力を測る「日本語 MT-Bench」、英語理解能力を測る「MT-Bench」において、強力な多言語LLMである「GPT-4o」や「Qwen3-8B」を上回る性能を達成しています。　LLMCでは「LLM-jp-4 8Bモデル」とMoEモデル「LLM-jp-4 32B-A3Bモデル」を活用してLLMの透明性・信頼性の確保に向けた研究開発を進めていきます。また、現在、より大規模なパラメータを備えたモデルの開発を進めており、2026年度に順次公開予定です。

日本語

604

344K

Haiyue Song@shyyhs·5d

@tatyam_prime (入社も)おめでとうございます！

日本語

453

tatyam@tatyam_prime·5d

よろしくお願いします！

日本語

238

7.7K

Haiyue Song@shyyhs·5d

🧵9/9 Our work suggests that data mixture ratio selection in continual pre-training can be reformulated as a post-hoc optimization problem instead of being fixed before training. For future work, OptiMer can be applied to mid-training (post-training of foundation models), where dataset mixture ratio remains a major pain point.

English

355

Haiyue Song@shyyhs·5d

@AISingapore 🧵8/9 🛡️ Case study: OptiMer better resists adversarial questions designed to elicit common misconceptions, which is a benefit from its high IT capability.

English

431

Haiyue Song@shyyhs·5d

English

2.8K

Keşfet

@RuoyuSun_UI @Phanron_xli @MPEG31 @stomohide @yo_ehara @cs_lisheng @ShengzheLi @kodama26985649