✦

13.2K posts

✦

@indes_yo

這裡就紀錄我當下的想法 ✶

Taipei Katılım Ocak 2023

1.8K Takip Edilen1.7K Takipçiler

Sabitlenmiş Tweet

✦@indes_yo·8 May

͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

7.4K

✦@indes_yo·6h

Everyday!

Wildminder@wildmindai

RotorQuant - upgraded TurboQuant. > 10x KV cache compression > 28% faster decoding > 5x faster prefill > 44x fewer parameters Same quality as full attention. 1/10th the memory. Ok, another massive VRAM discount for local LLMs. github.com/scrya-com/roto…

English

✦ retweetledi

Wildminder@wildmindai·9h

English

109

40.3K

✦@indes_yo·10h

@ethanhuang13 可能我M1 只拿來隔離跑Agent 還有一些輕度工作吧

中文

13@ethanhuang13·13h

@indes_yo 對我來說是瀏覽器最明顯

中文

13@ethanhuang13·14h

還沒有機會摸到 M5 雖然我沒有把 M3 Max 換掉的需求跟慾望但是買了 Mac mini 以後我一直能知覺到 M4 相比 M3 Max 的日常操作靈敏度提升（因為單核心性能有差）所以還是不要去摸好了 #週間廢推

中文

988

✦@indes_yo·10h

我幾點幾年前有一個老師跟我說他年輕的時候因為要學習3D繪圖所以去花大錢買了一種當年很強大的浮點運算卡我也不確定是什麼卡但就有點像現在的本地AI熱潮這些先進硬體都還非常貴可是能處理的東西非常有限

中文

✦@indes_yo·10h

@aliez_ren 太酷了

日本語

166

Aliez Ren@aliez_ren·11h

提前收到了，装机

Aliez Ren@aliez_ren

另外三张 RTX PRO 6000 买下了，4 月底到货，届时基本可以实现本地 LLM 自由，希望 DeepSeek V4 到时候已经发布了

中文

103

27.4K

✦@indes_yo·10h

✦@indes_yo

“RETRY” everyday. @antigravity

ZXX

✦@indes_yo·10h

“RETRY” everyday. @antigravity

English

✦ retweetledi

Sandro@pupposandro·1d

Excited to release a Megakernel to make a 6-year-old RTX 3090 running Local LLMs faster than apple's latest M5 Max chip. not a benchmark trick. same model, same weights, one kernel change. the full breakdown is in the article below. Open-source, MIT licensed, you can reproduce it in one command.

Sandro@pupposandro

x.com/i/article/2041…

English

693

50K

✦@indes_yo·13h

@CryptoMaster_70 讀過書跟拿到文憑是兩回事

中文

Master | 最強打野(穢土轉生)@CryptoMaster_70·1d

講真你如果發現你社群的分析師沒讀過大學或是讀學店你還敢信他們嗎台灣幣圈現況學歷不是唯一出路只是篩選手段但95%都沒讀過書

中文

5.4K

✦@indes_yo·13h

@NousResearch Adding this to Hermes? github.com/milla-jovovich…

English

Nous Research@NousResearch·1d

Pretty cool to see Tobi using Hermes and the Manim skill!

tobi lutke@tobi

Hermes agent ships with this nifty /manim_video skill so I asked it to explain how a QMD query works:

English

375

24.6K

✦@indes_yo·13h

明年回頭看自己的電腦應該是裝了很多垃圾吧

中文

✦ retweetledi

Geek Lite@QingQ77·1d

为 Hermes 打造的 HUD 一款开源的 TUI 显示器 → 从 Hermes Agent 数据目录实时读取记忆、纠错、工具使用等状态并渲染成交互式 TUI → 9 个标签页覆盖仪表盘、成长对比、Cron 任务、项目追踪、健康检查、Prompt 模式等 → 4 套赛博朋克风格主题（Neural Awakening / Blade Runner / fsociety / Digital Soul） → 快照对比功能直观展示 Agent 从昨天到今天记住了什么、少犯了什么错 github.com/joeynyc/hermes…

中文

198

23.4K

✦ retweetledi

【LLM推論が最大6倍速に！新技術「DFlash」が革命的】推論速度を劇的に向上させる新手法が登場しました。精度を維持したまま、Qwen3.5などの最新モデルで400 tokens/s超えを記録しています。🚀 注目のポイント：・ブロック拡散モデルにより複数単語を並列予測・最高峰手法「EAGLE-3」より最大2.5倍高速・サーバーコスト削減とUX向上の両立が可能 AIエージェントの並列処理もスムーズになる、まさに次世代の高速化技術です！✨ #LLM #DFlash

日本語

843

51K

✦@indes_yo·13h

我沒辦法給人建議我只能自己靜靜地下場玩

中文

✦@indes_yo·13h

@Prince_Canuma Any new ideas to speed up the Prefill time?

English

134

Prince Canuma@Prince_Canuma·20h

Just implemented TriAttention in MLX and the results are wild! You can get up to 81% KV compression at 60K tokens for Gemma-4-31B-IT in BF16 🔥 Unlike TurboQuant, which quantizes KV cache values, TriAttention prunes low-importance tokens entirely by scoring keys using trigonometric series from pre-RoPE Q/K concentration and keeping only the top-B most important ones. The best part? Decode speed for BF16 stays locked at ~10 t/s while baseline drops to 8.7 at long contexts. This results scale well with the quantized version as well. Benchmarked on Gemma4-31B-it with MM-NIAH on M5 Ultra: ~1K → 3% saved ~7K → 34% saved ~15K → 52% saved ~30K → 69% saved ~60K → 81% saved KV cache capped at 0.82 GB regardless of context length. One-time calibration (~30s), then it just works during generation. One caveat: TriAttention by design is best suited for generative task (reasoning/code) and not retrieval tasks. PR will follow soon on MLX-VLM.

Yukang Chen@yukangchen_

We’re thrilled to open-source TriAttention! 🚀 🦞 Deploy OpenClaw (32B LLM) on a single 24GB RTX 4090 locally 💻Full code open-source & vLLM-ready for one-click deployment ⚡️ 2.5× faster inference speed & 10.7× less KV cache memory usage TriAttention is a novel KV cache compression method built on rigorous trigonometric analysis in the Pre‑RoPE space for efficient LLM long reasoning. Github Repo: github.com/WeianMao/triat… Paper Link: huggingface.co/papers/2604.04… Homepage: weianmao.github.io/tri-attention-…

English

503

43.8K

✦@indes_yo·1d

@TD_CCK 把這種都丟一丟全換成了半固態或全固態電池了

中文

434

NK Chen@TD_CCK·1d

真的是奉勸你各位：千萬不要買這種帶插頭的行動電源。（這顆在上新聞前就買了，沒輒）

中文

424

75.5K

✦@indes_yo·1d

opengridworks.com/power-plants?l…

ZXX

✦@indes_yo·1d

This reminds me that the iPads everyone keeps in the drawer finally have a use.

English

✦@indes_yo·1d

@SpatiallyMe This reminds me that the iPads everyone keeps in the drawer finally have a use.

English

Phil Traut ᯅ@SpatiallyMe·2d

I just launched a free App that lets you open Mac Apps…from your iPhone. And close them with a swipe. A much faster drag and drop. And a ton of more features in the pipeline. It’s called choclift and you really should try it out. Available for free in the App Store for iOS and macOS.

English

152

316

4.6K

482K

Keşfet

@ethanhuang13 @aliez_ren @antigravity @CryptoMaster_70 @NousResearch @Prince_Canuma @elonmusk @BarackObama