msqrd

37 posts

msqrd

@msqrd_6

私は人間であり、間違えることがあります。

Katılım Haziran 2025

18 Takip Edilen4 Takipçiler

msqrd@msqrd_6·1d

メモメモ github.com/TencentYoutuRe…

日本語

msqrd@msqrd_6·2d

diffusabilityの高いvaeでlatentモデルを作ってからpixelモデルにファインチューニングする方式がスタンダードになりそうな予感

日本語

msqrd@msqrd_6·2d

Detailer Headが、AsymFlowでいうP,(I-P)とかの処理と似ている処理をしているっぽい？ arxiv.org/abs/2605.12013

日本語

1.3K

msqrd@msqrd_6·3d

昔はvaeというものがあって...という時代が来るのかもしれない

日本語

msqrd@msqrd_6·4d

AsymFlow,Transformerの内部次元がそこそこでかいモデル（3000～）じゃないとおそらくうまくいかなさそう

日本語

msqrd@msqrd_6·4d

Animaってアーキテクチャ簡単に見れない感じなのかな？ぱっと調べたけどわからんかった。diffusersにあると楽なんだけどな～

日本語

msqrd@msqrd_6·6d

AsymFlow、さすがに超軽量の自作モデルだとうまくいかないなーという結論ににった

日本語

msqrd@msqrd_6·16 May

AsymFlowのファインチューニング動いた！（動いただけ）

日本語

msqrd@msqrd_6·16 May

実装したいのに研究用のPCが使えないのでやることがない

日本語

msqrd@msqrd_6·15 May

Pxが低周波情報で、(I-P)xが高周波情報（x-Px→xから低周波情報を除いたもの）と考えたらそれっぽいような気がしてきた

日本語

msqrd@msqrd_6·15 May

深い話には込み入っていませんがとりあえず数式の浅い理解をしました Asymmetric Flow Modelsを理解したい｜msqrd @msqrd_6 note.com/msqrd/n/na76d9…

日本語

2.2K

msqrd@msqrd_6·14 May

いまやってる研究ではVAEをいかに改良したらピクセルの質感を出せるかが一つの課題だったから、低コストでピクセル空間学習できるのであればこれ以上のことはない

日本語

msqrd@msqrd_6·14 May

かなり激熱 arxiv.org/abs/2605.12964

日本語

msqrd@msqrd_6·24 Nis

自己回帰型の言語モデルに言語拡散モデルの仕組みを教えてもらうというよくわからない状態になっている

日本語

msqrd@msqrd_6·18 Nis

Exclusive Self AttentionとScreening Is Enoughの理論組み合わせたら大分熱そう

日本語

msqrd@msqrd_6·8 Şub

v-predのスケジューラのあれこれめちゃくちゃ単純だな

日本語

msqrd@msqrd_6·30 Oca

改めてscheduler周りの知識がないことを思い知らされている

日本語

msqrd@msqrd_6·18 Oca

sakanaが出してたDRoPE使えそうだな

日本語

msqrd@msqrd_6·18 Oca

画像生成AIって指名手配犯の顔画像生成に使えそうだけど、実用化されていたりするんだろうか

日本語

msqrd retweetledi

Takuya Akiba@iwiwi·12 Oca

論文公開しました！RoPE、実は学習を手助けしているだけで、最終的には要らないかも、って論文です。NoPE（位置埋め込みなし）でも実は位置を扱えること自体は有名かもと思うのですが、実際のところ最初からNoPEだと学習うまく行かないんですよね。途中でRoPEをdropする"DroPE"でいいとこ取りします。

Sakana AI@SakanaAILabs

Introducing DroPE: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings pub.sakana.ai/DroPE/ We are releasing a new method called DroPE to extend the context length of pretrained LLMs without the massive compute costs usually associated with long-context fine-tuning. The core insight of this work challenges a fundamental assumption in Transformer architecture. We discovered that explicit positional embeddings like RoPE are critical for training convergence but eventually become the primary bottleneck preventing models from generalizing to longer sequences. Our solution is radically simple: We treat positional embeddings as a temporary training scaffold rather than a permanent architectural necessity. Real-world workflows like reviewing massive code diffs or analyzing legal contracts require context windows that break standard pretrained models. While models without positional embeddings (NoPE) generalize better to these unseen lengths, they are notoriously unstable to train from scratch. Here, we achieve the best of both worlds by using embeddings to ensure stability during pretraining and then dropping them to unlock length extrapolation during inference. Our approach unlocks seamless zero-shot context extension without any expensive long-context training. We demonstrated this on a range of off-the-shelf open-source LLMs. In our tests, recalibrating any model with DroPE requires less than 1% of the original pretraining budget, yet it significantly outperforms established methods on challenging benchmarks like LongBench and RULER. We have released the code and the full paper to encourage the community to rethink the role of positional encodings in modern LLMs. Paper: arxiv.org/abs/2512.12167 Code: github.com/SakanaAI/DroPE

日本語

587

85.1K

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry