Bit Cook

3.1K posts

Bit Cook banner
Bit Cook

Bit Cook

@bit_cook

Explorer Developer Innovator Transhumanist Cosmopolitan Cypherpunk Philosophy & NeuroScience Enthusiast Financial Alt Account: @ValueCaptor

The Real World Tham gia Mayıs 2013
1.6K Đang theo dõi286 Người theo dõi
Bit Cook đã retweet
Geek Lite
Geek Lite@QingQ77·
从0训练一个0.1B的端到端全模态模型,一个权重搞定文字、语音、图片输入,输出文字和流式语音。 github.com/jingyaogong/mi… MiniMind-O 是一个只有0.1B参数的全模态模型,Thinker-Talker 双路径设计,支持文字/语音/图片输入,输出文字和流式语音。 这项目把代码、权重、训练数据和技术报告全部开源,核心算法用 PyTorch 从0写,一张3090两小时就能跑通 mini 数据集训练。
Geek Lite tweet media
中文
6
67
382
16.5K
Bit Cook đã retweet
hardmaru
hardmaru@hardmaru·
Reproducing all of Schmidhuber’s papers (1990-2025) using an AI coding assistant. Cool project by @yaroslavvb! It even reproduced the “World Models” paper by me and @SchmidhuberAI with a toy env, with a full VAE + RNN world model implementation. Project: github.com/cybertronai/sc…
GIF
English
20
93
655
41.6K
Bit Cook đã retweet
hardmaru
hardmaru@hardmaru·
The human brain🧠 is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it. One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math. We teamed up with @NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a "Hybrid" format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens. Through TwELL and a new set of custom CUDA kernels for both LLM inference and training, we translated theoretical sparsity into actual wall-clock speedups: >20% faster training and inference on H100 GPUs, while also cutting energy consumption and memory requirements. Paper: arxiv.org/abs/2603.23198 Blog: pub.sakana.ai/sparser-faster… Code: github.com/SakanaAI/spars… ⚡️
hardmaru tweet media
Sakana AI@SakanaAILabs

How do we make LLMs faster and lighter? Don’t force the GPU to adapt to sparsity. Reshape the sparsity to fit the GPU! ⚡️ Excited to share our new #ICML2026 paper in collaboration with @NVIDIA: "Sparser, Faster, Lighter Transformer Language Models". This work introduces new open-source GPU kernels and data formats for faster inference and training of sparse transformer language models: Paper: arxiv.org/abs/2603.23198 Blog: pub.sakana.ai/sparser-faster… Code: github.com/SakanaAI/spars… While LLMs are undoubtedly powerful, they are increasingly expensive to train and deploy, with a large part of this cost coming from their feedforward layers. Yet, an interesting phenomenon occurs inside these layers: For any given token, only a small fraction of the hidden activations actually matter. The rest approximate zero, wasting computation. With ReLU and very mild L1 regularization, this sparsity can exceed 95% with little to no impact on downstream performance. So, can we leverage this sparsity to make LLMs faster? The challenge is hardware. Modern GPUs are optimized for dense matrix multiplications. Traditional sparse formats introduce irregular memory access and overheads that cancel out their theoretical savings for GEMM operations. Our contribution is twofold: 1/ We introduce TwELL (Tile-wise ELLPACK), a new sparse packing format designed to integrate directly in the same optimized tiled matmul kernels without disrupting execution. 2/ We develop custom CUDA kernels that fuse multiple sparse matmuls to maximize throughput and compress TwELL to a hybrid representation that minimizes activation sizes. We used our kernels to train and benchmark sparse LLMs at billion-parameter scales, demonstrating >20% speedups and even higher savings in peak memory and energy. This work will be presented at #ICML2026. Please check out our blog and technical paper for a deep dive!

English
43
397
2.8K
295K
Bit Cook đã retweet
Mathematica
Mathematica@mathemetica·
Terence Tao is answering a fundamental question regarding the safety and reliability of modern AI: "How can we use a tool that is powerful, but unreliable?" W = ∑(wᵢ ⋅ xᵢ) + b AI isn’t just about “smart”; it’s about the probability of *looking* right. We’ve built systems where the weights (wᵢ) are optimized for plausibility, not veracity. This creates a “convincing mirror” that confidently serves dangerous advice in medicine or finance. The gap between “convincing” and “correct” is the most critical variable we need to solve for.
English
104
575
2.2K
563.6K
Bit Cook đã retweet
Berryxia.AI
Berryxia.AI@berryxia·
真的,只有大牛才敢站出来这么说! 全世界公认的最聪明的人之一,Terence Tao,亲自站出来把AI最致命的缺陷直接戳破了。 他问了一个所有人都回避的根本问题: “我们该如何使用一个强大、却极度不可靠的工具?” AI的核心方程写得清清楚楚: W = ∑(wᵢ ⋅ xᵢ) + b 它不是在追求“正确”, 而是在追求“看起来正确”。 所有权重都被优化成plausibility(似是而非),而不是veracity(真实性)。 于是我们造出了一个超级会“装”的镜子: 它在医学、金融、法律等领域,能用最自信、最流畅的语气, 给你最危险、最错误的建议。 “Convincing”和“Correct”之间的鸿沟, 才是AI时代最致命的风险。 我们越是依赖它,它就越容易把我们带进自己都看不出来的陷阱。 当最顶尖的数学家都在认真讨论“如何安全使用不可靠的AI”时, 我们普通人还在为“它写代码好快”鼓掌吗? 这段视频值得每一个用AI的人反复看。
Mathematica@mathemetica

Terence Tao is answering a fundamental question regarding the safety and reliability of modern AI: "How can we use a tool that is powerful, but unreliable?" W = ∑(wᵢ ⋅ xᵢ) + b AI isn’t just about “smart”; it’s about the probability of *looking* right. We’ve built systems where the weights (wᵢ) are optimized for plausibility, not veracity. This creates a “convincing mirror” that confidently serves dangerous advice in medicine or finance. The gap between “convincing” and “correct” is the most critical variable we need to solve for.

中文
141
270
1.2K
342.5K
Bit Cook đã retweet
奶昔🥤
奶昔🥤@realNyarime·
“资本主义Online” 5月5日,福建厦门。一名初三学生说,英语老师为了激励大家,制作了班级货币(简称英镑) 认真写作业以及考试成绩好的学生可以获得英镑。 每两周老师会拍卖零食,而零食需要用英镑兑换。 之后有学生仅用一周就完成了原始的资本积累,甚至在班里开设了“赌场和贷款业务”。甚至还有血腥的“三角贸易”比如有同学在“赌场”负债没钱了,只能去贷款,然后短时间还不上钱被资本斩杀,只能被别人花钱做廉价劳动力。 由于英语老师每天给优秀的学生发新的英镑,导致班级的英镑数量增加引发通货膨胀,上周买一瓶可乐需要5英镑,这周需要10英镑。 此外,英语老师还会在拍卖会上,给优秀学生“特权”。 拥有特权的人可以得到老师双倍的英镑奖励。于是一些学生把作业交给有特权的人,由他们代为交给老师,以赚取双倍英镑,之后双方平分。 于是一些有特权的学生,因为能更快获得英镑从而实现了财富的快速积累,甚至出现了他人通过大量英镑垄断零食,再让其他想吃零食的人用人民币购买的情况,实现了间接与人民币挂钩。最后那些资本雄厚的学生,甚至开设了“银行”使“英镑”直接与人民币挂钩,具有实时汇率变化。
中文
55
7
233
56.8K
Bit Cook đã retweet
Seth Howes
Seth Howes@SethSHowes·
I sequenced my genome at home, on my kitchen table. I wrote up exactly how I did it - the equipment, protocol, theory, and cost: iwantosequencemygenomeathome.com
English
108
763
4.7K
1.2M
Bit Cook đã retweet
Rey|判断位 x 英语自由
Rey|判断位 x 英语自由@ReyJudgementOS·
震撼:小哥利用AI,在家自行完成了基因组测序 一个有好奇心、能动性并且会学AI工具的年轻人, 可以做到什么? ——从医疗机构夺回决策权 推主追踪到了家族多代自身免疫疾病背后的机制, 这些机制此前没有任何临床医生能够理解。 他开始做这件事的时候, 并不知道是否真的能行得通。 结果证明,它行通了。 “你的基因组是你所拥有的最私密的数据。 你很可能不应该让它离开你的房子” Seth Howes公布了完整操作规程。 以前只由大型专业机构垄断的事情, 现在DIY了 原因? 好奇心(家族疾病)+能动性+AI 设施? 1) MinION测序仪 (把“读取DNA”从一个资本密集型行为,变成一个“工具型能力”) 2) 开源DNA模型(Evo2和AlphaGenome) 3) DGX Spark和Mac Studio 突破? 1)测序成本持续下降(类似摩尔定律) 从几十万美元 → $1000级别 下一步:$100级别 2)AI对生物数据的理解在指数提升 文中提到: AlphaGenome 这类模型意味着: 不只是“读DNA”,而是开始“理解功能” 3)接口变简单(MinKNOW + LLM) 文中一句非常关键: 用Claude生成BED文件 生物学操作 → 被语言接口接管 推主长文链接在评论区 适合大学生尝试
Rey|判断位 x 英语自由 tweet media
Seth Howes@SethSHowes

I sequenced my genome at home, on my kitchen table. I wrote up exactly how I did it - the equipment, protocol, theory, and cost: iwantosequencemygenomeathome.com

中文
9
34
162
20.4K
Bit Cook đã retweet
kache
kache@yacineMTB·
you can outsource your thinking but you cannot outsource your understanding
English
242
3.6K
16.3K
2.2M
Bit Cook đã retweet
luthira
luthira@luthiraabeykoon·
We implemented @karpathy 's MicroGPT fully on FPGA fabric. No GPU. No PyTorch. No CPU inference loop. Just a transformer burned into hardware, generating 50,000+ tokens/sec. The model is small, but the idea is not: inference does not have to live only in software 👇
English
272
703
7.5K
838.7K
Bit Cook đã retweet
Geek Lite
Geek Lite@QingQ77·
帮开发者用自己项目的真实源码,自动生成软著申请全套材料,不用再花钱找人整理。 github.com/Fokkyp/Softwar… 这个 Codex Skill 读取你的项目代码,分析业务逻辑后自动生成操作手册、代码材料(按前30页后30页规则截取)和申请表字段汇总。 代码只从你自己的项目里抽,AI 不会凭空编。生成过程中,业务口径、申请表字段、代码选择、截图方式这些关键环节都会停下来让你确认。最后输出操作手册 DOCX、代码材料 DOCX 和申请表 TXT,放在项目目录下的 软件著作权申请资料/正式资料/。
Geek Lite tweet media
中文
4
32
193
14.1K
Bit Cook
Bit Cook@bit_cook·
向量才是AI原生语言,用自然语言只是为了方便人类,却降低了很多效率。
中文
0
0
0
12
Bit Cook đã retweet
alphaXiv
alphaXiv@askalphaxiv·
“Recursive Multi-Agent Systems” Many multi-agent LLM systems rely on agents passing text back and forth. This paper argues for a different approach where it makes agents recur together in latent space. So agents refine latent thoughts, pass hidden states across one another, and only decode text at the end. The key idea is that recursion scales the whole agent system, not just one model, and in their experiments this makes collaboration more accurate, faster, and much cheaper in tokens.
alphaXiv tweet media
English
13
87
494
25.7K
Bit Cook đã retweet
Association for Computing Machinery
Happy Birthday to Claude Shannon, known by many as the “father of Information Theory.” Shannon was an American mathematician and electrical engineer. In 1948, he published A Mathematical Theory of Communication, which effectively created the field.
English
10
230
666
36.9K
Bit Cook đã retweet
alphaXiv
alphaXiv@askalphaxiv·
What if the model didn’t just use a computer, but actually was the computer? Meta AI introduces "Neural Computer", a model where computation, memory, and I/O are all inside one learned system. Their early prototype learns from screen recordings of terminals and desktops, and it can already imitate some basic computer behavior like rendering interfaces and responding to clicks or commands. But it still breaks on slightly harder tasks like reliable reasoning, stable memory, and reusable skills.
alphaXiv tweet media
English
28
144
917
154.8K
Bit Cook đã retweet
Nick Levine
Nick Levine@status_effects·
New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:
English
170
357
2.8K
988.5K
Bit Cook đã retweet
Haider.
Haider.@haider1·
Andrej Karpathy says computing may shift from classical software to neural systems Instead of code running everything, neural nets could take raw video, audio, and context, then generate interfaces and actions in real time "the CPU becomes the coprocessor, handling fixed tasks while neural nets run the show"
English
67
121
885
69.2K