AI Adam

3.6K posts

AI Adam

@AI_AdamZ

AI. Space. @StardustTrade_

LLM Katılım Ekim 2021

2.3K Takip Edilen2.2K Takipçiler

Sabitlenmiş Tweet

AI Adam@AI_AdamZ·2 Kas

x.com/i/article/1985…

ZXX

1.1K

AI Adam retweetledi

Elon Musk@elonmusk·4d

Critique of the 𝕏 algorithm is welcome. There will be monthly updates of the latest algorithm to GitHub with release notes. As reminder, you can always choose no algorithm via the Following tab.

Linus ✦ Ekenstam@LinusEkenstam

This is how the algorithm can completely destroy your reach over night. This is the last: Left: 3 months Right: 2 weeks Super consistent 85-95% drop on all metrics. everything after a viral post going ballistic, I tried everything, cool down, delete low quality posts, block bot accounts. Kept posting after cool down, nothing really breaks through. Short hot takes 🛑 Long form with good signal 🛑 Viral potential post 🛑 Core audience value post 🛑 What bothers me here is that 48h after posting a mega viral post I get suppressed back to the Stone Age. This follow previous situations I’ve had with the grok powered algorithm. Where it feels like tweepCred falls far below a certain level, and you’re locked into a low reach prison with every effort to break out is making it harder and harder to do so. I’m asking for transparency on what we can do as content creators when this happens. I don’t want to spam my way out of this. I’d like to know, if I did something wrong, how I can address it, take the responsibility of algorithmic suppression for what ever the length is. But this limbo is most likely going to make me leave the platform.

English

6.5K

7.6K

43.7K

19.4M

AI Adam@AI_AdamZ·3d

@yminsky @dwarkesh_sp Who don’t build LLM themselves are no longer quantitative trading firms for sure 💯

English

528

Yaron (Ron) Minsky@yminsky·4d

We gave @dwarkesh_sp a tour of one of our new GPU-filled data-centers. Much fun! youtube.com/watch?v=8J-GUn…

YouTube

English

595.6K

Elon Musk@elonmusk·5d

@whyyoutouzhele 我的儿子正在学习普通话

中文

8.2K

6.3K

117.3K

13.1M

AI Adam@AI_AdamZ·5d

@elonmusk @whyyoutouzhele Can I apply to be his teacher?🤓

English

李老师不是你老师@whyyoutouzhele·6d

马斯克小儿子穿中国风马甲 5月14日上午，马斯克与苹果CEO库克、英伟达CEO黄仁勋等十余名美方商界代表一同进入中美元首会谈现场。引人注目的是，54岁的马斯克此行带上了6岁的小儿子，照片显示他穿着一件带有中式元素的上衣。

中文

956

1.6K

28.7K

4.2M

AI Adam@AI_AdamZ·12 May

legend

Demis Hassabis@demishassabis

I’ve always believed the No.1 application of AI should be to improve human health. That work started with AlphaFold, and now at @IsomorphicLabs with the mission to reimagine drug discovery and one day solve all disease! We are turbocharging that goal with $2.1B in new funding.

English

AI Adam@AI_AdamZ·12 May

agree

Don Wilson@drwconvexity

I've believed for years that compute would evolve into one of the world’s most important commodities — which is why I backed the creation of @Silicon_Data and @computeexchange two years ago. Today’s announcement from @Silicon_Data and @CMEGroup is an important step in that direction. As AI scales, compute markets are developing the same kinds of supply, volatility and capital allocation dynamics we’ve seen in energy and other major commodity markets. Futures markets matter because they improve price discovery, reduce the cost of capital and support long-term infrastructure investment. ft.com/content/3e6b81…

English

AI Adam@AI_AdamZ·12 May

@SemiAnalysis_ This is my quant.

English

814

SemiAnalysis@SemiAnalysis_·12 May

After studying 300 Leetcode Hards, solving every Jane Street puzzle from the Dwarkesh ads, and watching one Horace He lecture, he finally landed the $400k annualized Jane Street internship. Unfortunately, during onboarding his manager said “this diff is negative alpha,” so Jane Street deployed an AI model to translate all feedback into HR-safe speech in real time.

English

682

133.5K

AI Adam@AI_AdamZ·11 May

My biggest mistake this year is, I thought IBKR could not buy Korean stocks until recently, the fact is I could buy 2x last year……

English

AI Adam@AI_AdamZ·10 May

Damn, next tiktok level idea?

good@thenarrator

the next social network is a prediction market where your feed is ranked by accuracy not engagement. the person who is right 80% of the time gets seen while the person who is loud gets buried

Filipino

AI Adam retweetledi

Tilde@tilderesearch·8 May

Introducing Aurora, a new optimizer for training frontier-scale models. We train Aurora-1.1B, which achieves 100x data efficiency on open-source internet data. Despite having 25% fewer parameters, 2 orders of magnitude fewer training tokens, and using fully open-source internet-only data, Aurora matches Qwen3-1.7B on several benchmarks. Aurora was developed after identifying a major failure mode that can occur under Muon, an increasingly popular optimizer that has shown strong gains over Adam(W). We find that Muon can cause a huge percentage of neurons to effectively die early in training, reducing effective network capacity so that many parameters no longer meaningfully contribute to network outputs. By redistributing update energy more uniformly across neurons while preserving Muon’s stability properties, Aurora prevents neuron death and recovers substantial model capacity. What makes this work especially exciting is that it points toward a broader direction for ML research: better optimizers may not come purely from elegant mathematical abstractions, but from understanding and addressing the concrete dynamics and pathologies that emerge inside real training systems.

Tilde@tilderesearch

x.com/i/article/2052…

English

176

1.5K

515.6K

AI Adam@AI_AdamZ·10 May

@LinQingV 牛逼，学习

中文

Macro_Lin ｜市场观察员@LinQingV·9 May

之前做LLM推理芯片架构探索的时候，我把四大AI推理ASIC公司的架构都翻过一遍。Groq、SambaNova、Tenstorrent、Cerebras。前三家的思路虽然各有侧重，但底层逻辑都在同一个框架里：片上大SRAM + dataflow架构 + 确定性调度，核心差异在NoC拓扑、内存层级、编译器抽象这些维度上展开。 Cerebras是里面让我真正被震惊到的一家，而它却这四家里马上第一个拿到IPO结果的。这家公司的选择比其他三家都激进一个量级：不做芯片，直接做整片wafer。单颗WSE-3，21.5cm × 21.5cm的整片晶圆，90万个PE通过scribe-line stitching在物理上连成一片连续的silicon。这个工艺是Cerebras和TSMC联合定制的，把原本用于晶圆切割的窄条改造成跨reticle的金属导线，让所有reticle在物理上拼接成一整块芯片。（配图二展示了单颗WSE-3内部结构：左半边是整片晶圆的reticle网格和scribe-line拼接，右半边放大了单个PE的微架构。）单个PE的结构极简：8-wide FP16 SIMD计算核，48KB本地SRAM直连，没有cache层级，所有数据访问都是确定性的单周期。加上一个5端口路由器（N/S/E/W + loopback），相邻PE之间的通信延迟也是单周期。关键在于，跨reticle边界的mesh在物理参数上和reticle内部完全一致，编译器和runtime完全不需要感知reticle边界的存在。从LLM推理的视角看，这个均匀性的价值非常大。 LLM推理的瓶颈在decode阶段。每生成一个token，模型权重要被完整读取一次，计算量却很小，典型的memory-bound场景。GPU集群在这个环节的核心问题是数据搬运：HBM带宽有限，多卡之间还要经过NVLink → NVSwitch → InfiniBand → Ethernet四层互联，每一层带宽和延迟都差几个量级，编程模型必须显式处理每一层的拓扑边界。 Cerebras的做法完全绕开了这个问题。单片wafer内部fabric带宽27 PB/s，权重从外部的MemoryX存储集群通过SwarmX流入wafer后，在PE之间按数据流模式传播执行，同一套placement和routing算法跑遍整片wafer。（配图一展示了这个系统级架构：MemoryX参数存储集群到SwarmX互联fabric，再到底层最多2048台CS-3节点，权重广播和梯度规约的数据流方向一目了然。） 90万个PE各自带48KB SRAM，合计约42GB片上存储，每个PE对自己本地SRAM的访问是单周期确定性的，PE间通信每跳single-cycle，延迟和曼哈顿距离成正比。对于推理场景，前提是weight streaming的编译器能把权重有效地分配到对应的PE上，这42GB分布式片上SRAM的聚合带宽远超GPU的HBM方案，没有cache层级带来的访问不确定性，没有跨芯片搬运的开销。回到我自己的体感。做推理芯片架构的时候，NoC拓扑和内存层级的权衡花了大量精力，因为芯片边界是硬约束，跨芯片通信的成本和片内通信之间永远存在断层。Cerebras的做法等于从片内通信的角度消除了这个断层，代价是整条制造和封装链都要重新定义。这也解释了Cerebras的工程取舍。所有架构创新集中在wafer内部，scale-out方向直接复用100GbE + RoCE的以太网生态。wafer内27 PB/s对比跨CS-3的SwarmX在Tbps量级，几个数量级的差距全部交给商品化网络承担。推理场景下单wafer内部的带宽和延迟优势可以直接转化成token生成速度。 OpenAI选择和Cerebras合作做推理，从架构层面看逻辑是通的。大规模在线推理需要低延迟、高吞吐、确定性时延，这三点恰好是wafer-scale架构在片上通信均匀性方面的结构性优势。但这套架构也有几个结构性的问题值得正视。良率和成本是绕不开的。整片wafer做单颗芯片，任何一个reticle的缺陷都影响整体。Cerebras靠冗余PE和路由绕行来应对，但冗余比例和良率数据从未公开过。一片wafer的制造成本本身就远高于切割后卖单颗die的模式，叠加23kW、15U的单系统功耗和体积，部署密度和TCO在大规模推理集群的经济性上面临考验。最关键的是KV cache的容量瓶颈。42GB片上SRAM看起来很大，但长上下文推理场景下KV cache随序列长度线性增长。以Llama 70B为参考，FP16下128K上下文的KV cache就要吃掉约40GB，即使做KV cache量化，长序列场景下的容量压力仍然显著。片上放不下的部分必须依赖MemoryX做外部存储，数据要经过SwarmX回传，这条路径的带宽在Tbps量级，和wafer内部27 PB/s的差距意味着长序列场景下decode速度会被外部带宽卡住。这可能是Cerebras在推理场景面临的最核心的架构约束。

中文

272

31.8K

AI Adam retweetledi

Jim Keller@jimkxa·8 May

My current list of "laws" governing computer design I miss any ? Rents Rule Pollacks’s Rule Amdahls Law Moores Law Dennard Scaling Bitter lesson Little’s Law Jevon’s Paradox

English

367

45.8K

AI Adam retweetledi

antirez@antirez·7 May

Welcome to DS4, a specialized inference engine for DeepSeek v4 Flash. github.com/antirez/ds4 This project would have been impossible without the existence of llama.cpp and GGML and the work of @ggerganov and all the other contributors. Thanks!

English

217

1.5K

192.3K

AI Adam retweetledi

Goodfire@GoodfireAI·7 May

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

English

306

1.7K

11.1K

AI Adam@AI_AdamZ·3 May

wow love it

luthira@luthiraabeykoon

We implemented @karpathy 's MicroGPT fully on FPGA fabric. No GPU. No PyTorch. No CPU inference loop. Just a transformer burned into hardware, generating 50,000+ tokens/sec. The model is small, but the idea is not: inference does not have to live only in software 👇

English

AI Adam@AI_AdamZ·3 May

@jukan05 I think the X accounts you followed are the most useful sources in public, and what's the english and Korean sources do you recommend?

English

312

Jukan@jukan05·3 May

What are the most useful Substacks, media outlets, or websites for tracking Chinese technology trends and the current state of China’s tech ecosystem? Paid sources are fine. I’d appreciate your recommendations.

English

561

64.2K

AI Adam@AI_AdamZ·30 Nis

Competition just began 👀

Zephyr@zephyr_z9

So, Jensen was right all along...

English

123

AI Adam@AI_AdamZ·30 Nis

@realarmaansidhu yes

Armaan Sidhu@realarmaansidhu·29 Nis

@AI_AdamZ You meant *Neural Network ?

English

535

Armaan Sidhu@realarmaansidhu·29 Nis

Jane Street's moat isn't tech. It's flow. George Coyle asks the right question and the answer is uncomfortable for finance Twitter. Jane Street trades roughly $20 billion a day across ETFs, options, and fixed income. They're the largest market maker in US ETFs by a wide margin. They handle around a third of all retail ETF flow. They see order book activity nobody else sees. That flow trains their pricing models. The pricing models capture the flow. The captured flow trains the next iteration of models. Recursion all the way down. This is the flywheel hedge funds talk about and almost nobody actually has. Citadel Securities is the only real competitor. The two firms together handle north of 50 percent of US equity options volume. D.E. Shaw and Two Sigma can't catch up because they don't run market-making books at this scale. They trade on signals. Jane Street trades on flow. What nobody's saying: barriers to entry in modern market making are now structural, not technological. You can hire the same PhDs. You can buy the same hardware. You cannot manufacture a 15-year head start on order flow data, broker relationships, and exchange-level rebate structures. Jane Street pays out 40 percent of revenue to its 2,500 employees. Bonus pools that hit $100M for individual senior partners. Citadel does the same. That money isn't free. It's the rent collected on a flywheel nobody else can build anymore. The story isn't why nobody's competing. It's why nobody can.

George Coyle@gfc4

If Jane Street is making so much money, why isn't anyone coming in to compete thus reducing their revenue? Technological barriers to entry?

English

635

94.6K

AI Adam retweetledi

AI Adam@AI_AdamZ·29 Nis

@realarmaansidhu No the moat is tech. I haven’t seen other trading firms understand neutral network that well.

English

623

Keşfet

@yminsky @dwarkesh_sp @whyyoutouzhele @elonmusk @SemiAnalysis_ @LinQingV @ggerganov @jukan05