Fumi

7 posts

Fumi

Fumi

@fumishiki

Founder, 21. Rust 1500m 3'54"51 · now building AI

Katılım Ocak 2026
5 Takip Edilen135 Takipçiler
Fumi
Fumi@fumishiki·
Hourglassと MEGABYTEを引用しているのに新規を主張するのは研究倫理的にまずいです。階層型Trasnformerの先行研究論文は数多くある中で階層をinference primitiveとして扱う最初のモデルであると主張することは出来ません。 PHOTON のRecGen の核であるtop-level KVだけ持って下層は local 再構成で済ますのはYOCOそのもの。L段に拡張した点だけが差分。これを引用していないのは knowingly omitting the closest prior workです。 Hourglass (2022), MEGABYTE (2023), YOCO(2024) という直接的な先行研究を引用または無視した上で、本質的に同型のアーキテクチャを "vertical scanning という新パラダイム"として提示するのは研究倫理に欠けます
日本語
0
5
36
9.9K
Yuma Ichikawa
Yuma Ichikawa@yuma_1_or·
我々が開発したTransformerとは異なる階層言語モデル「PHOTON」がACL2026 (Main)に「Oral Presentation」として採択⚡️ メモリあたりのThroughputはTransformerの1000倍を達成🔥 超Long Context × Multi-Queryが重要となるマルチエージェント時代に, このモデルはどのような革命をもたらすのか…😎
GIF
日本語
10
198
1.1K
203.6K
Fumi
Fumi@fumishiki·
RULER、BABILong、Scrolls、LongBench、あるいはneedle-in-a-haystackの標準実装でもいいですが既存のlong-contextベンチが複数あるのに全部スキップして自作タスクだけ持ってきてる時点で既存ベンチでは勝てなかったのでは?と疑われても仕方ないと思いますが その上で既存研究のSigmoid attentionはsoftmaxをsigmoidに置き換えて、各keyのweightを独立に[0,1]で計算します。key間の競合なし、絶対的relevance。Multiscreenの主張と完全に同じ性質です。違いはsigmoidの代わりにunit-norm + Trim-and-Square(閾値付き二乗)を使ってるだけで本質的な差はactivation関数の形状でしかない。なのにこの論文はsigmoid attentionへの言及が一切ないのがかなりマズいです
日本語
1
0
13
3.2K
みぃ🍵
みぃ🍵@mithernet·
著者です! Attentionの「相対比較しかできない」という制約を外した、新しい機構を提案しました ①まずわかりやすい利点 ✅学習時より圧倒的に長い文でも性能維持&正確な情報取得 ✅収束が非常に高速(LR=1でも学習可能) ✅モデルサイズ4割削減 ✅推論速度3倍超 (続く) arxiv.org/abs/2604.01178
みぃ🍵 tweet media
日本語
16
133
804
81.9K
Fumi
Fumi@fumishiki·
I’m bored out of my mind because AI researchers and companies in Japan, even in 2026, are still trying—and failing—to catch up to where GPT-4 was three years ago. They’re complete amateurs with no real knowledge updates.
English
0
0
0
1.2K
Fumi
Fumi@fumishiki·
I’m seriously working on this seven days a week with no days off. That’s why it’s frustrating when LLMs respond with things like “That’s a sharp question!” or “Your opinion is absolutely correct!” so casually. People say AI can be a good sparring partner, but the flattery problem is severe. Even when you push it with critical prompts, it often just performs the role convincingly rather than showing real intellectual grounding. In the community, people often say AI is just a sophisticated pattern matcher. And honestly, I think every AI company today still runs into that same limitation.
English
0
1
4
1.7K
Fumi
Fumi@fumishiki·
Late-night debugging thoughts. Building AI models isn’t just implementing equations from papers. You have to understand the architecture, trade-offs, and the implicit knowledge the paper doesn’t explain, and judge whether the authors’ claims actually hold. No current LLM can make or verify those architectural decisions. The only realistic workflow is letting LLMs scan papers while I verify the ideas, implement them, run training, fail, and iterate. In this field, a 1-year-old paper is outdated. A 3-month-old paper should be obvious. If something trends on arXiv yesterday, it should be implemented today. Without that speed, you can’t compete with frontier AI labs. Many AI companies lose innovation to internal politics and meetings. I hate that. Real ideas should be tested immediately in practice.
English
1
0
0
1.4K
Fumi
Fumi@fumishiki·
1:30 AM JST. Diving into debugging the training code for the ML model I’m developing. Let’s have fun.
English
0
0
0
981
Fumi
Fumi@fumishiki·
Building as a newcomer in tech is an interesting journey. I recently open-sourced nabla, a pure Rust math engine 8.3–11.6× faster than PyTorch eager on GH200. Here’s what happened when I tried to share it: 🦀 r/rust (200k+ devs): Hit #1! Had incredibly deep technical discussions about memory arenas... until Reddit auto-banned my brand new account for "spam." 🟠 HN: Flagged. I was told the README seemed AI-generated and I need to "spend time building community karma" before sharing my work. Turns out, the hardest bottleneck to bypass isn't Python-to-C++ dispatch overhead, but the "new account karma" filters! 🚧 I guess I'll just let the code speak for itself. You can check it out here (no karma required): github.com/fumishiki/nabla
Fumi tweet media
English
0
0
2
1K