Ken Sakurada

4.2K posts

Ken Sakurada

@sakuDken

Interests: Computer Vision, Robotics. ツイートは個人の見解であり所属団体とは関係ありません。

Katılım Aralık 2011

2K Takip Edilen1.8K Takipçiler

Sabitlenmiş Tweet

Ken Sakurada@sakuDken·6 Kas

本日のチュートリアル講演の資料です。これからVisual SLAMを勉強される方に必要な知識をできるだけ体型的にまとめました。ご参考になれば幸いです。 Visual SLAM入門〜発展の歴史と基礎の習得〜 speakerdeck.com/ksakurada/visu…

日本語

461

Ken Sakurada retweetledi

Daisuke Okanohara / 岡野原大輔@hillbig·12h

VGGT-Ωは、画像や動画から3次元世界を理解するための研究である。先行研究であるVGGTは、1回の前向き伝播によって、画像群からカメラパラメータ、点群、深度、トラッキングを推定するモデルだった。 VGGT-Ωはこの方向をさらに推し進め、大規模化のためのさまざまな工夫を導入した。その結果、3D再構成の問題においても、モデルやデータを大きくすることで性能が向上する、いわゆるスケーリング則が成り立つことを示した。 1つ目の工夫は、アーキテクチャの単純化である。複数の専用ヘッドを持つのではなく、単一のヘッドを用いて複数の出力を予測する。また、計算コストの大きい高解像度畳み込み層も除去している。 2つ目は、出力をカメラパラメータと深度マップに絞ったことである。この2つがあれば、残りの3D情報は幾何的に復元できる。つまり、モデルにはより基本的な量だけを予測させる設計にしている。 3つ目は、レジスタートークンの導入である。各フレームは、カメラトークンに加えて16個のレジスタートークンを持つ。これらのレジスタートークンは、フレーム全体、あるいはシーン全体の情報を圧縮して保持する役割を担う。そして、一部のグローバルアテンションを、レジスタートークン同士だけがフレーム間で情報交換する層に置き換えている。 4つ目は、学習データの大規模化である。VGGT-Ωは、従来の15倍の教師ありデータで訓練され、さらに大量の未ラベル動画も活用している。未ラベル動画については、DINOと同様の自己蒸留により学習する。実験では、カメラ推定精度において従来最高性能を77%改善した。また、学習されたレジスタートークンがシーンの要約表現として優れており、VLAや言語アライメントにも有効であることが示されている。コメント === カメラパラメータ推定、すなわち自己位置・姿勢推定や3次元復元といったタスクは、従来、複雑なパイプラインによって実現されてきた。そこでは、対応点検出、Bundle Adjustment、深度融合などの処理が必要であり、オンラインで行うSLAMも同様に複雑な処理を必要としていた。 VGGTは、これらの処理をニューラルネットワークに一度通すだけで直接出力するという、非常に野心的なモデルである。今回のVGGT-Ωは、その設計をさらに単純化・大規模化し、大きな性能向上を達成した。論文冒頭のグラフで示されているように、モデルやデータをスケールさせることで性能が改善する道筋はすでに示されている。今後は、さらなる大規模化による性能改善に加え、その結果を蒸留することで、小型モデルでも高い性能を達成できる可能性がある。また、VLAや言語アライメントに有効である点も興味深い。最後の考察で述べられているように、今後は言語学習などと組み合わせ、例えばキャプション生成や生成タスクなど、より豊富な学習シグナルとともに訓練する方向に進むだろう。

日本語

8.4K

Ken Sakurada retweetledi

Claudia Cuttano@ClaudiaCuttano·23h

✨#CVPR2026 Oral ✨ A tale of a failed experiment: what if you fine-tune DINOv2 on sparse keypoints, beat every benchmark, only to discover it performs worse than the original frozen model on novel keypoints? 🚀MARCO closes this gap: a unified model for generalisable correspondences github.com/visinf/MARCO

English

375

20.6K

Ken Sakurada retweetledi

Kento Kawaharazuka / 河原塚健人@KKawaharazuka·17 May

PM採択に伴い, 一緒に研究してくれる特任助教・研究員・技術職員・サイエンスコミュニケーターなど, 様々な職種を募集しています. 興味のある方はぜひ気軽にDM, または haraduka.github.io からメールしてください！

Kento Kawaharazuka / 河原塚健人@KKawaharazuka

ムーンショット目標3にPMとして採択されました！ヒューマノイド×機械学習の若手研究者らを総動員して最高のヒューマノイドを開発します！若手研究者よ、集まれ！ jst.go.jp/moonshot/news/…

日本語

155

30.1K

Ken Sakurada retweetledi

Masaki Saito@rezoolab·16 May

仙台がもう少し気合入っていればもう少し雇用生まれたはずいま教育と研究だけ異常に手厚いのに卒業した学生が外に行ってしまう一日に一回くらいは文句言ってる、仙台に

日本語

Ken Sakurada retweetledi

Sakana AI@SakanaAILabs·9 May

Sakana AIは、@NVIDIAとの共同研究で、スパースなTransformer言語モデルの推論・学習を高速化する新しいGPUカーネルとデータ形式を開発しました。ブログ：pub.sakana.ai/sparser-faster… LLMのコストの大部分を占めるフィードフォワード層では、実は各トークンに対して大半の活性がほぼゼロで無駄な計算になっています。ReLUと軽いL1正則化を組み合わせれば、性能をほとんど落とさずにスパース率を95%以上まで引き上げられます。ところが現代のGPUは密な行列積に最適化されており、従来のスパース形式は不規則なメモリアクセスのせいで理論上の高速化が相殺されてしまいます。そこで私たちは、 ① 最適化されたタイル型matmulカーネルにそのまま組み込める新しいスパース格納形式 TwELL (Tile-wise ELLPACK) と、 ② 複数のスパースmatmulを融合してスループットを最大化するカスタムCUDAカーネルを考案しました。数十億パラメータ規模のスパースLLMを実際に学習・評価したところ、20%以上の高速化と、ピークメモリ・消費電力の大幅な削減を達成しました。本研究は #ICML2026 にて発表されます。ぜひブログと論文をご覧ください。論文：arxiv.org/abs/2603.23198 GitHub：github.com/SakanaAI/spars…

Sakana AI@SakanaAILabs

How do we make LLMs faster and lighter? Don’t force the GPU to adapt to sparsity. Reshape the sparsity to fit the GPU! ⚡️ Excited to share our new #ICML2026 paper in collaboration with @NVIDIA: "Sparser, Faster, Lighter Transformer Language Models". This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer language models: Paper: arxiv.org/abs/2603.23198 Blog: pub.sakana.ai/sparser-faster… Code: github.com/SakanaAI/spars… While LLMs are undoubtedly powerful, they are increasingly expensive to train and deploy, with a large part of this cost coming from their feedforward layers. Yet, an interesting phenomenon occurs inside these layers: For any given token, only a small fraction of the hidden activations actually matter. The rest approximate zero, wasting computation. With ReLU and very mild L1 regularization, this sparsity can exceed 95% with little to no impact on downstream performance. So, can we leverage this sparsity to make LLMs faster? The challenge is hardware. Modern GPUs are optimized for dense matrix multiplications. Traditional sparse formats introduce irregular memory access and overheads that cancel out their theoretical savings for GEMM operations. Our contribution is twofold: 1/ We introduce TwELL (Tile-wise ELLPACK), a new sparse packing format designed to integrate directly in the same optimized tiled matmul kernels without disrupting execution. 2/ We develop custom CUDA kernels that fuse multiple sparse matmuls to maximize throughput and compress TwELL to a hybrid representation that minimizes activation sizes. We used our kernels to train and benchmark sparse LLMs at billion-parameter scales, demonstrating >20% speedups and even higher savings in peak memory and energy. This work will be presented at #ICML2026. Please check out our blog and technical paper for a deep dive!

日本語

529

79.9K

Ken Sakurada retweetledi

Yasutaka Furukawa@YasutakaFuruka1·5 May

We’ve been working on geometry foundation models for self-driving. Following Rig3R at NeurIPS last year, here is another work from the team: learning camera motion from 10.2 million unlabeled driving video snippets, without pose labels. See our blog post and paper (CVPR2026).

Wayve@wayve_ai

Meet LA-Pose. Our latest model taking Wayve another step towards generalization at scale. LA-Pose employs large-scale self-supervised learning, building strong motion representations for 3D perception from 10.2 million unlabeled driving video snippets, unlike today's strongest approaches that often depend on expensive, carefully curated 3D supervision. With only a lightweight pose head and limited labelled data, LA-Pose achieves: 📷 State-of-the-art camera pose estimation 🌎 Strong zero-shot generalization across diverse driving scenarios 🏷️ Orders of magnitude less labelled data than fully supervised 3D approaches Our full blog post: wayve.ai/thinking/la-po… Explore the full paper here: la-pose.github.io

English

12.8K

Ken Sakurada retweetledi

Ouster@ousterlidar·4 May

A New Standard in Sensing: Lidar Colorized, Ruggedized, Maximized. Today, we released the REV8 OS family: the world’s first native color lidar sensors. ✔️Powered by our breakthrough L4 and L4 Max Ouster Silicon ✔️First patented native color lidar sensors with point for point 3D color vision ✔️Radically upgraded OS0, OS1, and OSDome sensors with industry‑leading resolution, range, and reliability ✔️Introduced the flagship OS1 Max with 256 channels of high-definition sensing up to 500 m in all directions with a 45° FOV ✔️Auto-grade, cybersecure, and designed for functional-safety (ASIL-B, SIL-2, PLd) ✔️Designed for low-cost, high-volume production deployments 🧵We’ve redefined the meaning of lidar itself.

English

138

888

263.1K

Ken Sakurada retweetledi

ネギ塩タン@7児母@negi_shio_taaan·23 Nis

バースデイのこれ欲しいんだけど今日オンラインで店頭はいつなのだろう…オンラインは惨敗なので店頭で買いたい

日本語

207

4.8K

580.3K

Ken Sakurada retweetledi

Zhijian Liu@zhijianliu_·19 Nis

Reasoning VLAs can think. They just can't think fast. Until now. Introducing FlashDrive⚡ 🚀 716 ms → 159 ms on RTX PRO 6000 (up to 5.7×) ✅ Zero accuracy loss FlashDrive = streaming inference + DFlash speculative reasoning + ParoQuant W4A8 Real-time reasoning for autonomous driving is here! z-lab.ai/projects/flash…

English

163

1.3K

163.9K

Ken Sakurada retweetledi

🌸Sakura Yae/八重さくら🌸@yaesakura2019·18 Nis

中国LiDAR大手のHesaiが世界初のフルカラーLiDARを発表、カメラなしで最大4,320ch/4k解像度のフルカラー3D映像を認識可能に🤯 後処理によるカメラ情報との統合が不要となり、ハードの性能を抑えつつ高度な処理が可能になるとしています⚡ （しかも26年後半に出荷開始予定...！） Hesai releases world's first full-color LiDAR chip, supporting up to 4,320 laser channels cnevpost.com/2026/04/18/hes…

日本語

798

3.9K

822.8K

Ken Sakurada retweetledi

Yinghao Xu@YinghaoXu1·16 Nis

🎉 After one year of teamwork, we are excited to release our 3D foundation model — LingBot-Map! Unlike DA3/VGGT, LingBot-Map is a purely autoregressive model for streaming 3D reconstruction ⚡ It achieves ~20 FPS on 518×378 resolution over sequences exceeding 10,000 frames — and beyond 🚀 Two key insights behind LingBot-Map: 🔑 Keep SLAM's structural wisdom: build Geometric Context Attention with long-context modeling while maintaining a compact streaming state 🔑 Make everything end-to-end learnable — no optimization, no post-processing Let's check out our demos 👇

English

492

4.7K

1.4M

Ken Sakurada retweetledi

Daniel DeTone@ddetone·8 Nis

Today we release Boxer, a new lightweight approach that lifts open-world 2D bounding boxes to *metric* 3D: facebookresearch.github.io/boxer/ Here we show Boxer in action on an egocentric sequence captured from smart glasses:

English

168

1.3K

79.2K

Ken Sakurada retweetledi

Yuma Ichikawa@yuma_1_or·7 Nis

我々が開発したTransformerとは異なる階層言語モデル「PHOTON」がACL2026 (Main)に「Oral Presentation」として採択⚡️ メモリあたりのThroughputはTransformerの1000倍を達成🔥 超Long Context × Multi-Queryが重要となるマルチエージェント時代に, このモデルはどのような革命をもたらすのか…😎

GIF

日本語

198

1.1K

203.7K

Ken Sakurada retweetledi

Yoshinobu Kawahara@44nb_k·7 Nis

さきがけ「デジタル時空間拡張基盤」領域が立ち上がりました。数理科学・情報科学・計算科学・データ科学の広い分野が関連すると思いますので、領域ページjst.go.jp/kisoken/boshuu…をご確認の上で、ぜひ応募ご検討ください。

JST CREST・さきがけ・ACT-X@JST_Kisokenkyu

2026年度発足のCREST・さきがけ・ACT-X 新規研究領域はこちらです。各領域の詳細は、募集HPの「提案を募集する研究領域」からそれぞれの領域のページに入ってご確認ください。 jst.go.jp/kisoken/boshuu…

日本語

15.9K

Ken Sakurada retweetledi

Shinpei KATO (加藤真平)@ShinpeiKato·7 Nis

福山市の皆様、ありがとうございます。まさに自動運転移動サービスの社会実装の絵姿です。 news.yahoo.co.jp/articles/ddf1b…

日本語

10.8K

Ken Sakurada retweetledi

Min Choi@minchoi·5 Nis

Netflix just dropped VOID. This AI removes objects from video... And even corrects the physics after objects/people are removed. Demo in commets👇

English

103

187

2.3K

2.3M

Ken Sakurada retweetledi

Xinggang Wang@XinggangWang·3 Nis

VLA is a dominant paradigm in autonomous driving, yet existing approaches lack perception capabilities—detection, mapping, and occupancy—crucial in systems like Tesla FSD. We present UniDriveVLA, the first unified framework integrating them in one: huggingface.co/papers/2604.02…

English

26.6K

Ken Sakurada retweetledi

Andrea Tagliasacchi 🇨🇦@taiyasaki·3 Nis

📢📢📢 Introducing "FullCircle: Effortless 3D Reconstruction from Casual 360° Captures" TL;DR: 10x faster casual capture with clean reconstructions Homepage: theialab.github.io/fullcircle Code: github.com/theialab/fullc… arXiv: arxiv.org/abs/2603.22572 Led by Yalda Foroutan & Ipek Oztas