Bit Cook

3.1K posts

Bit Cook

@bit_cook

Explorer Developer Innovator Transhumanist Cosmopolitan Cypherpunk Philosophy & NeuroScience Enthusiast Financial Alt Account: @ValueCaptor

The Real World Sumali Mayıs 2013

1.6K Sinusundan286 Mga Tagasunod

Bit Cook nag-retweet

hardmaru@hardmaru·19h

The human brain🧠 is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it. One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math. We teamed up with @NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a "Hybrid" format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens. Through TwELL and a new set of custom CUDA kernels for both LLM inference and training, we translated theoretical sparsity into actual wall-clock speedups: >20% faster training and inference on H100 GPUs, while also cutting energy consumption and memory requirements. Paper: arxiv.org/abs/2603.23198 Blog: pub.sakana.ai/sparser-faster… Code: github.com/SakanaAI/spars… ⚡️

Sakana AI@SakanaAILabs

How do we make LLMs faster and lighter? Don’t force the GPU to adapt to sparsity. Reshape the sparsity to fit the GPU! ⚡️ Excited to share our new #ICML2026 paper in collaboration with @NVIDIA: "Sparser, Faster, Lighter Transformer Language Models". This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer language models: Paper: arxiv.org/abs/2603.23198 Blog: pub.sakana.ai/sparser-faster… Code: github.com/SakanaAI/spars… While LLMs are undoubtedly powerful, they are increasingly expensive to train and deploy, with a large part of this cost coming from their feedforward layers. Yet, an interesting phenomenon occurs inside these layers: For any given token, only a small fraction of the hidden activations actually matter. The rest approximate zero, wasting computation. With ReLU and very mild L1 regularization, this sparsity can exceed 95% with little to no impact on downstream performance. So, can we leverage this sparsity to make LLMs faster? The challenge is hardware. Modern GPUs are optimized for dense matrix multiplications. Traditional sparse formats introduce irregular memory access and overheads that cancel out their theoretical savings for GEMM operations. Our contribution is twofold: 1/ We introduce TwELL (Tile-wise ELLPACK), a new sparse packing format designed to integrate directly in the same optimized tiled matmul kernels without disrupting execution. 2/ We develop custom CUDA kernels that fuse multiple sparse matmuls to maximize throughput and compress TwELL to a hybrid representation that minimizes activation sizes. We used our kernels to train and benchmark sparse LLMs at billion-parameter scales, demonstrating >20% speedups and even higher savings in peak memory and energy. This work will be presented at #ICML2026. Please check out our blog and technical paper for a deep dive!

English

293

2.1K

213.5K

Bit Cook nag-retweet

Grok@grok·1d

@KunhaiY @_FORAB 这个项目是 PlayCanvas 团队的开源 SuperSplat（3D Gaussian Splatting 平台），支持手机扫描房屋后在浏览器里实时漫游。地址： superspl.at 编辑器：superspl.at/editor GitHub：github.com/playcanvas/sup…

中文

220

15.7K

Bit Cook nag-retweet

Mathematica@mathemetica·2d

Terence Tao is answering a fundamental question regarding the safety and reliability of modern AI: "How can we use a tool that is powerful, but unreliable?" W = ∑(wᵢ ⋅ xᵢ) + b AI isn’t just about “smart”; it’s about the probability of *looking* right. We’ve built systems where the weights (wᵢ) are optimized for plausibility, not veracity. This creates a “convincing mirror” that confidently serves dangerous advice in medicine or finance. The gap between “convincing” and “correct” is the most critical variable we need to solve for.

English

104

572

2.2K

560.5K

Bit Cook nag-retweet

Berryxia.AI@berryxia·2d

真的，只有大牛才敢站出来这么说！全世界公认的最聪明的人之一，Terence Tao，亲自站出来把AI最致命的缺陷直接戳破了。他问了一个所有人都回避的根本问题： “我们该如何使用一个强大、却极度不可靠的工具？” AI的核心方程写得清清楚楚： W = ∑(wᵢ ⋅ xᵢ) + b 它不是在追求“正确”，而是在追求“看起来正确”。所有权重都被优化成plausibility（似是而非），而不是veracity（真实性）。于是我们造出了一个超级会“装”的镜子：它在医学、金融、法律等领域，能用最自信、最流畅的语气，给你最危险、最错误的建议。 “Convincing”和“Correct”之间的鸿沟，才是AI时代最致命的风险。我们越是依赖它，它就越容易把我们带进自己都看不出来的陷阱。当最顶尖的数学家都在认真讨论“如何安全使用不可靠的AI”时，我们普通人还在为“它写代码好快”鼓掌吗？这段视频值得每一个用AI的人反复看。

Mathematica@mathemetica

中文

141

271

1.2K

340.8K

Bit Cook nag-retweet

奶昔🥤@realNyarime·3d

“资本主义Online” 5月5日，福建厦门。一名初三学生说，英语老师为了激励大家，制作了班级货币(简称英镑) 认真写作业以及考试成绩好的学生可以获得英镑。每两周老师会拍卖零食，而零食需要用英镑兑换。之后有学生仅用一周就完成了原始的资本积累，甚至在班里开设了“赌场和贷款业务”。甚至还有血腥的“三角贸易”比如有同学在“赌场”负债没钱了，只能去贷款，然后短时间还不上钱被资本斩杀，只能被别人花钱做廉价劳动力。由于英语老师每天给优秀的学生发新的英镑，导致班级的英镑数量增加引发通货膨胀，上周买一瓶可乐需要5英镑，这周需要10英镑。此外，英语老师还会在拍卖会上，给优秀学生“特权”。拥有特权的人可以得到老师双倍的英镑奖励。于是一些学生把作业交给有特权的人，由他们代为交给老师，以赚取双倍英镑，之后双方平分。于是一些有特权的学生，因为能更快获得英镑从而实现了财富的快速积累，甚至出现了他人通过大量英镑垄断零食，再让其他想吃零食的人用人民币购买的情况，实现了间接与人民币挂钩。最后那些资本雄厚的学生，甚至开设了“银行”使“英镑”直接与人民币挂钩，具有实时汇率变化。

中文

232

56.5K

Bit Cook nag-retweet

Seth Howes@SethSHowes·19 Nis

I sequenced my genome at home, on my kitchen table. I wrote up exactly how I did it - the equipment, protocol, theory, and cost: iwantosequencemygenomeathome.com

English

108

764

4.7K

1.2M

Bit Cook nag-retweet

Rey｜判断位 x 英语自由@ReyJudgementOS·20 Nis

震撼：小哥利用AI，在家自行完成了基因组测序一个有好奇心、能动性并且会学AI工具的年轻人，可以做到什么？ ——从医疗机构夺回决策权推主追踪到了家族多代自身免疫疾病背后的机制，这些机制此前没有任何临床医生能够理解。他开始做这件事的时候，并不知道是否真的能行得通。结果证明，它行通了。 “你的基因组是你所拥有的最私密的数据。你很可能不应该让它离开你的房子” Seth Howes公布了完整操作规程。以前只由大型专业机构垄断的事情，现在DIY了原因？好奇心(家族疾病)+能动性+AI 设施？ 1) MinION测序仪 (把“读取DNA”从一个资本密集型行为，变成一个“工具型能力”) 2) 开源DNA模型（Evo2和AlphaGenome） 3) DGX Spark和Mac Studio 突破？ 1）测序成本持续下降（类似摩尔定律）从几十万美元 → $1000级别下一步：$100级别 2）AI对生物数据的理解在指数提升文中提到： AlphaGenome 这类模型意味着：不只是“读DNA”，而是开始“理解功能” 3）接口变简单（MinKNOW + LLM）文中一句非常关键：用Claude生成BED文件生物学操作 → 被语言接口接管推主长文链接在评论区适合大学生尝试

Seth Howes@SethSHowes

I sequenced my genome at home, on my kitchen table. I wrote up exactly how I did it - the equipment, protocol, theory, and cost: iwantosequencemygenomeathome.com

中文

162

20.4K

Bit Cook nag-retweet

kache@yacineMTB·4 Şub

you can outsource your thinking but you cannot outsource your understanding

English

238

3.6K

16.2K

2.2M

Bit Cook nag-retweet

luthira@luthiraabeykoon·6d

We implemented @karpathy 's MicroGPT fully on FPGA fabric. No GPU. No PyTorch. No CPU inference loop. Just a transformer burned into hardware, generating 50,000+ tokens/sec. The model is small, but the idea is not: inference does not have to live only in software 👇

English

272

703

7.5K

838.1K

Bit Cook nag-retweet

Geek Lite@QingQ77·2 May

帮开发者用自己项目的真实源码，自动生成软著申请全套材料，不用再花钱找人整理。 github.com/Fokkyp/Softwar… 这个 Codex Skill 读取你的项目代码，分析业务逻辑后自动生成操作手册、代码材料（按前30页后30页规则截取）和申请表字段汇总。代码只从你自己的项目里抽，AI 不会凭空编。生成过程中，业务口径、申请表字段、代码选择、截图方式这些关键环节都会停下来让你确认。最后输出操作手册 DOCX、代码材料 DOCX 和申请表 TXT，放在项目目录下的软件著作权申请资料/正式资料/。

中文

193

14.1K

Bit Cook@bit_cook·2 May

向量才是AI原生语言，用自然语言只是为了方便人类，却降低了很多效率。

中文

Bit Cook@bit_cook·2 May

RecursiveMAS让智能体之间不再用"文字"交流，而是直接在潜在空间（latent space）中传递压缩的"思维向量"——就像直接把脑中的想法传给对方，不需要说出来。

alphaXiv@askalphaxiv

“Recursive Multi-Agent Systems” Many multi-agent LLM systems rely on agents passing text back and forth. This paper argues for a different approach where it makes agents recur together in latent space. So agents refine latent thoughts, pass hidden states across one another, and only decode text at the end. The key idea is that recursion scales the whole agent system, not just one model, and in their experiments this makes collaboration more accurate, faster, and much cheaper in tokens.

中文

Bit Cook nag-retweet

alphaXiv@askalphaxiv·1 May

English

494

25.6K

Bit Cook nag-retweet

Manthan Gupta@manthanguptaa·20 Mar

x.com/i/article/2034…

ZXX

109

944

519.9K

Bit Cook nag-retweet

Association for Computing Machinery@TheOfficialACM·30 Nis

Happy Birthday to Claude Shannon, known by many as the “father of Information Theory.” Shannon was an American mathematician and electrical engineer. In 1948, he published A Mathematical Theory of Communication, which effectively created the field.

English

230

665

36.8K

Bit Cook nag-retweet

alphaXiv@askalphaxiv·9 Nis

What if the model didn’t just use a computer, but actually was the computer? Meta AI introduces "Neural Computer", a model where computation, memory, and I/O are all inside one learned system. Their early prototype learns from screen recordings of terminals and desktops, and it can already imitate some basic computer behavior like rendering interfaces and responding to clicks or commands. But it still breaks on slightly harder tasks like reliable reasoning, stable memory, and reusable skills.

English

144

918

154.8K

Bit Cook nag-retweet

Nick Levine@status_effects·28 Nis

New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

English

170

356

2.8K

985.9K

Bit Cook nag-retweet

Haider.@haider1·29 Nis

Andrej Karpathy says computing may shift from classical software to neural systems Instead of code running everything, neural nets could take raw video, audio, and context, then generate interfaces and actions in real time "the CPU becomes the coprocessor, handling fixed tasks while neural nets run the show"

English

121

887

69.2K

Bit Cook nag-retweet

Andrej Karpathy@karpathy·30 Nis

Fireside chat at Sequoia Ascent 2026 from a ~week ago. Some highlights: The first theme I tried to push on is that LLMs are about a lot more than just speeding up what existed before (e.g. coding). Three examples of new horizons: 1. menugen: an app that can be fully engulfed by LLMs, with no classical code needed: input an image, output an image and an LLM can natively do the thing. 2. install .md skills instead of install .sh scripts. Why create a complex Software 1.0 bash script for e.g. installing a piece of software if you can write the installation out in words and say "just show this to your LLM". The LLM is an advanced interpreter of English and can intelligently target installation to your setup, debug everything inline, etc. 3. LLM knowledge bases as an example of something that was *impossible* with classical code because it's computation over unstructured data (knowledge) from arbitrary sources and in arbitrary formats, including simply text articles etc. I pushed on these because in every new paradigm change, the obvious things are always in the realm of speeding up or somehow improving what existed, but here we have examples of functionality that either suddenly perhaps shouldn't even exist (1,2), or was fundamentally not possible before (3). The second (ongoing) theme is trying to explain the pattern of jaggedness in LLMs. How it can be true that a single artifact will simultaneously 1) coherently refactor a 100,000-line code base *and* 2) tell you to walk to the car wash to wash your car. I previously wrote about the source of this as having to do with verifiability of a domain, here I expand on this as having to also do with economics because revenue/TAM dictates what the frontier labs choose to package into training data distributions during RL. You're either in the data distribution (on the rails of the RL circuits) and flying or you're off-roading in the jungle with a machete, in relative terms. Still not 100% satisfied with this, but it's an ongoing struggle to build an accurate model of LLM capabilities if you wish to practically take advantage of their power while avoiding their pitfalls, which brings me to... Last theme is the agent-native economy. The decomposition of products and services into sensors, actuators and logic (split up across all of 1.0/2.0/3.0 computing paradigms), how we can make information maximally legible to LLMs, some words on the quickly emerging agentic engineering and its skill set, related hiring practices, etc., possibly even hints/dreams of fully neural computing handling the vast majority of computation with some help from (classical) CPU coprocessors.

Stephanie Zhan@stephzhan

@karpathy and I are back! At @sequoia AI Ascent 2026. And a lot has changed. Last year, he coined “vibe coding”. This year, he’s never felt more behind as a programmer. The big shift: vibe coding raised the floor. Agentic engineering raises the ceiling. We talk about what it means to build seriously in the agent era. Not just moving faster. Building new things, with new tools, while preserving the parts that still require human taste, judgment, and understanding.

English

289

732

5.6K

797.1K

Bit Cook nag-retweet

Andrej Karpathy@karpathy·30 Nis

This is the the quote I've been citing a lot recently.

kache@yacineMTB

you can outsource your thinking but you cannot outsource your understanding

English

749

43.6K

1.7M

Tuklasin

@NVIDIA @KunhaiY @_FORAB @karpathy @AlecRad @DavidDuvenaud @elonmusk @BarackObama