smly

7.2K posts

smly

@smly

AI Fellow at Rist. ex-PFN / 4x Kaggle Grandmaster https://t.co/Fbnhh888VD / Google Developer Expert (Kaggle) / Mahjong AI https://t.co/LQRTdKP3AO

Katılım Nisan 2007

3.6K Takip Edilen8.3K Takipçiler

smly@smly·2d

@earhian2 ロン！

日本語

718

earhian@earhian2·2d

@smly 立直！

日本語

163

smly@smly·2d

Kaggle Notebook の Embed 機能のお試し。KaggleのGPUを使ってゲーム実行してリプレイを Kaggle Notebook 上で可視化してブログに埋め込む。便利 #kaggle-notebook" target="_blank" rel="nofollow noopener">ho.lc/blog/riichienv…

日本語

1.6K

smly@smly·5d

@hotpepsi おお、なるほど！Windowsの操作感を完全に忘れていました。

日本語

hotpepsi@hotpepsi·5d

@smly そのうち左側でドラッグするように慣れるかもなのですが Windowsだと右上にボタン類があり、どのアプリでもドラッグできる隙間があるのに気づきました。macで縦タブだと隙間がなくなり違和感に気づいたという感じです。

日本語

139

hotpepsi@hotpepsi·5d

Chromeに縦タブ来た。これは良いかも。設定にタブの位置が追加されてた。ただmacだと隙間が少なくてウィンドウのドラッグがしづらい

日本語

1.4K

smly@smly·13 Mar

オンライン対局の同時実行が3ゲームに増えた🙏 最強の麻雀AIの参戦をお待ちしてます riichi.dev/games

日本語

1.9K

smly retweetledi

YI@y_imjk·11 Mar

麻雀AIをLLMでやるやつの発表があり嬉しい anlp.jp/proceedings/an…

日本語

2.7K

smly@smly·11 Mar

@y_imjk うれしい！シャンテン数計算の高速化が無いと高度な特徴量計算はCPUボトルネックになって使えないので先人に感謝しかないです

日本語

YI@y_imjk·10 Mar

シャンテン数計算の高速化周りの話とか昔ブログで読んだりしててほえ〜とか言ってた記憶があり、nyantenが使われてたりするのを見てにっこりしてる

日本語

227

YI@y_imjk·10 Mar

趣味で麻雀環境作るかとか思ってたけど要らなくなりそう

日本語

405

smly@smly·10 Mar

autoresearch is fascinating. For about a week I had an agent autonomously edit code, run training, monitor metrics, and iterate on Mahjong AI reinforcement learning. It explored PPO hyperparameters, KL penalties, reward structures, opponent selection, and rolled back to stable checkpoints when metrics degraded. Couldn't solve a fundamentally misaligned reward signal, but watching it map out the failure landscape was eye-opening.

English

2.3K

smly retweetledi

snwy@snwy_me·10 Mar

autoresearch really interested me, despite me not being "all-in" on agents yet. i wanted to get started with running auto experiments i looked to existing tools to serve as a harness but each one had its problems. so i made one introducing Helios for autonomous ML research

Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English

1.5K

183.4K

smly@smly·10 Mar

一番のお気に入りポイントは CSS 3D Transforms でゲームシーンを構成して wasm+svg すべて込みで gz 圧縮して 174kb で動作するところです

日本語

1.3K

smly@smly·10 Mar

麻雀AIのシミュレーション環境とオープンアリーアを公開しました。AIのオンライン対戦ができます。戦いましょう ho.lc/blog/riichienv…

日本語

7.8K

smly retweetledi

Tri Dao@tri_dao·5 Mar

The FA4 paper is finally out after a year of work. On Blackwell GPUs, attention now goes about as fast as matmul even though the bottlenecks are so different! Tensor cores are now crazy fast that attn fwd is bottlenecked by exponential, and attn bwd is bottlenecked by shared memory bandwidth. Some fun stuff in the redesigned algorithm to overcome these bottlenecks: exponential emulation with polynomials, new online softmax to avoid 90% of softmax rescaling, 2CTA MMA instructions that allow two thread blocks to share operands to reduce smem traffic.

Ted Zadouri@tedzadouri

Asymmetric hardware scaling is here. Blackwell tensor cores are now so fast, exp2 and shared memory are the wall. FlashAttention-4 changes the algorithm & pipeline so that softmax & SMEM bandwidth no longer dictate speed. Attn reaches ~1600 TFLOPs, pretty much at matmul speed! joint work w/ Markus Hoehnerbach, Jay Shah(@ultraproduct), Timmy Liu, Vijay Thakkar (@__tensorcore__ ), Tri Dao (@tri_dao) 1/

English

230

1.8K

183.6K

smly@smly·6 Mar

FA4 まだ transformers での end-to-end の改善幅が小さくて気になるけど期待。varlen まわり問題になりやすいのわかりみ github.com/huggingface/tr…

日本語

1.1K

smly retweetledi

PyTorch@PyTorch·5 Mar

FlexAttention now has a FlashAttention-4 backend. FlexAttention has enabled researchers to rapidly prototype custom attention variants—with 1000+ repos adopting it and dozens of papers citing it. But users consistently hit a performance ceiling. Until now. We've added a FlashAttention-4 backend to FlexAttention on Hopper and Blackwell GPUs. PyTorch now auto-generates CuTeDSL score/mask modifications and JIT-instantiates FlashAttention-4 for your custom attention variant. The result: 1.2× to 3.2× speedups over Triton on compute-bound workloads. 🖇️ Read our latest blog here: hubs.la/Q045FHPh0 No more choosing between flexibility and performance. hashtag#PyTorch hashtag#FlexAttention hashtag#FlashAttention hashtag#OpenSourceAI

English

732

99.7K

smly@smly·23 Şub

distillation attacks を特定して attacker に対して架空の概念や思想を返却して欲しい。その痕跡をオープンウェイトモデルから観測したい x.com/AnthropicAI/st…

Anthropic@AnthropicAI

We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.

日本語

2.2K

smly@smly·23 Şub

Chrome の canary-dcheck build のオーバーヘッドが思っていた以上に大きくて泣いた

日本語

1.1K

smly@smly·16 Şub

weekly rate limits にぶつかったので Claude Max 20x にアップグレード。大電力消費時代って感じだ

日本語

1.4K

smly retweetledi

Lex Fridman@lexfridman·12 Şub

Here's my conversation with Peter Steinberger (@steipete), creator of OpenClaw, an open-source AI agent that has taken the Internet by storm, with now over 180,000 stars on GitHub. This was a truly mind-blowing, inspiring, and fun conversation! It's here on X in full and is up everywhere else (see comment). Timestamps: 0:00 - Episode highlight 1:30 - Introduction 5:36 - OpenClaw origin story 8:55 - Mind-blowing moment 18:22 - Why OpenClaw went viral 22:19 - Self-modifying AI agent 27:04 - Name-change drama 44:15 - Moltbook saga 52:34 - OpenClaw security concerns 1:01:14 - How to code with AI agents 1:32:09 - Programming setup 1:38:52 - GPT Codex 5.3 vs Claude Opus 4.6 1:47:59 - Best AI agent for programming 2:09:59 - Life story and career advice 2:13:56 - Money and happiness 2:17:49 - Acquisition offers from OpenAI and Meta 2:34:58 - How OpenClaw works 2:46:17 - AI slop 2:52:20 - AI agents will replace 80% of apps 3:00:57 - Will AI replace programmers? 3:12:57 - Future of OpenClaw community

English

502

1.1K

6.8K

smly@smly·6 Şub

.@takuoko1 さんとKINTAN焼肉！

日本語

smly@smly·5 Şub

Bambu Lab P2S Combo レビュー。届いた瞬間に破壊しましたが、なんとか沼に入門できました ho.lc/blog/bambu-p2s

日本語

1.3K

Keşfet

@earhian2 @hotpepsi @y_imjk @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates