Caleb Eom

327 posts

Caleb Eom banner
Caleb Eom

Caleb Eom

@calebfoundry

CalebWritesCode YT // Google Developer Expert

USA انضم Mart 2025
149 يتبع1.3K المتابعون
Caleb Eom
Caleb Eom@calebfoundry·
huge fan of @bycloudai ever since I started my AI YT channel. i feel so honoured to finally meet him in person.. thanks for inspiring me!
Caleb Eom tweet media
English
1
0
5
122
Caleb Eom
Caleb Eom@calebfoundry·
Nemotron 3 Full Breakdown With the help of Joey Conway from @NVIDIAAI getting into the specifics around why Nemotron 3 is kind of a big deal Biggest headline with Nemotron is: Hybrid Mamba Transformer, Latent MoE, and MTP Hybrid Mamba Transformer essentially attacks right at the Attention mechanism to make the overhead sub-quadratic, but unlike quantizing KV Cache or swapping out attention head, NVIDIA chose Mamba-2 Latent MoE helps further optimize on sparsity by down projecting the dimensions so you're doing less math and less memory movement between HBM and SRAM, you're saving a ton, and NVIDIA made a conscious choice to add more experts given the surplus Finally, MTP or multi token prediction where the model can see future tokens to be more expressive in training and also option to use for speculative decoding during inference Oh, also the model adopts the new OpenMDW 1.1 License
English
5
19
153
47.3K
Caleb Eom
Caleb Eom@calebfoundry·
MiniMax M3 ditched full attention and adopts sparse attention This is yet another trend as more labs are focusing on token efficiency and inference throughput which M3 model demonstrates which cleverly in the M3 architecture in how KV is processed I'm personally impressed by the I/O between HBM and SRAM and how tokens are read in tiles contiguously - not wasting operations. Great work @MiniMax_AI
English
1
0
9
596
Midnight Capital
Midnight Capital@Midnight_Captl·
Biggest payout yet 🥳
Midnight Capital tweet media
English
14
0
64
11.9K
Caleb Eom
Caleb Eom@calebfoundry·
in case the first message wasn't clear
Caleb Eom tweet media
English
0
0
5
181
Caleb Eom
Caleb Eom@calebfoundry·
Pi framework that built OpenClaw So many coding agents these days all look the same and feel the same. Pi goes against the current by shedding weights rather than gaining more And harness is changing with ebbs and flow which means being a minimalist adds durability
English
0
0
7
301
Caleb Eom
Caleb Eom@calebfoundry·
Typical day: Gemini, ChatGPT, Claude for research
Caleb Eom tweet media
English
1
0
2
177
Caleb Eom
Caleb Eom@calebfoundry·
California weather is not friendly to my hair but here's a quick interview about me and my journey as a content creator covering AI. Shout out to Greg and Tilde for the interview! youtu.be/TjPr_-X0Mko?si…
YouTube video
YouTube
English
1
0
5
249
Compute King
Compute King@Compute_King·
继续思考, 华为在挑战里面没有谈散热,这是我比较诧异的。 目前两层堆叠,我觉得还有些散热的解法,但如果到三层Active Logic Stack或者更多之后,散热会从工程问题变成架构主问题。。。 目前流行的双层堆叠的技术AMD V-Cache,Intel Foveros和TSMC SoIC,还属于用冷cache叠热logic,因为SRAM功耗较低,热密度低,可用做Top Die,所以散热还能接受。 结构如下所示: SRAM |||| CPU 但华为的论文里是Logic-on-Logic。 即: Active Logic |||| Active Logic |||| Active Logic 这就完全不同了,这种多层Active Logic,热无法横向扩散,所以中间的Logic Die直接变成了烤箱,传统散热是完全扛不住的。 三层或者三层Active Logic堆叠之后,必须进入主动式散热时代!冷却液必须进入封装内部。 变成, Active Logic + Microfluidic Channel |||| Active Logic + Microfluidic Channel |||| Active Logic + Microfluidic Channel 液冷液冷液冷是关键!关键得说三次。。。 以后芯片设计里面需要Thermal Topology Architect,因为:热路径本身会决定Layout。 对的,本人的判断是:华为将来3层和3层以上的LogicFolding路径里面,Thermal将是最大的未解难题,甚至比EDA还难!
Compute King tweet media
Compute King@Compute_King

论文里更多的思考: AI算力集群大量消耗电力,而且其中80%的电力和70%的成本并没有用于计算,而是被“Data Move”和数据的“Load/Save”消耗掉了 。 为了在宏观尺度压缩这些开销,华为在论文里面提到了三样东西: 1,Unified Bus(统一总线):这个我们之前好好地聊过,UB放弃了传统的复杂堆叠协议(PCIe, NVLink, 以太网等),采用内存语义的底层直接互联。这让端到端的远程访问延迟从数十微秒骤降至约100ns(指数级缩减),在多机柜甚至机房的规模上实现了“系统即芯片” 。 2,Hi-ONE(近封装光引擎):这种光学I/O单模块可提供8 Tb/s的带宽,将传统电SerDes的传输距离需求从100厘米骤降到约5厘米,同时将机柜间的互联距离扩展到100米,在物理层面保障了高密度计算 。 3,3D Folding:传统意义上的2.5D封装中,算力随芯片大小增长,但也受限于芯片大小。还记得之前的Cowos-S和给GB300用的Cowos-L? 华为的3D Folding强行将供电(背面供电网络),高速内存和光I/O从芯片的“边缘”转移到了垂直“表面”,这就有点意思了,大家都具备了3D的扩张能力,可以彻底让带宽与算力实现了同频共振 。。。

中文
87
92
685
533.2K
Chubby♨️
Chubby♨️@kimmonismus·
This is hilarious. This is what AI was made for. I love it. 100% accurate.
English
87
449
5.6K
440.3K
Caleb Eom
Caleb Eom@calebfoundry·
@krishdotdev I think the aggregate demand for inference is bigger than what DeepSeek is offering at such prices. Certainly a huge feat for DeepSeek but I don't buy the narrative that it's an end all for the US inference market.
English
0
0
7
373
Kr$na
Kr$na@krishdotdev·
DeepSeek just popped the American AI bubble. DeepSeek V4 Pro: Input: $0.435 per 1M tokens Output: $0.87 per 1M tokens OpenAI GPT-5.5: Input: $5.00 Output: $30.00 Claude Opus 4.7: Input: $5.00 Output: $25.00 Claude Sonnet 4.6: Input: $3.00 Output: $15.00
English
46
28
348
20.1K
Caleb Eom
Caleb Eom@calebfoundry·
@deepseek_ai Who would ACTUALLY change providers from US providers to DeepSeek because of this?
English
1
0
0
64
Caleb Eom
Caleb Eom@calebfoundry·
interesting question.. i think memory will eventually be served like an app. think of g-suite like calendar, gmail, keep, etc. but now memory. so similar to my inbox containing thousands of emails, i think memory will be locked based on my cloud account (with the ability to export) and sync via other ecosystems. there will be companies that solve it differently but i think it'll all converge into one standard at some point.
English
0
0
1
144
iiviie
iiviie@iiviieee·
Caleb im really interested in what are your thoughts about context engineering or just memory in short, i know a lot of major labs are trying to tackle this problem. Everyone has their own novel architecture to this, what do you think about having standard for memory kind of like how anthropic make MCP a standard for agentic tool calling
English
1
0
1
42
Caleb Eom
Caleb Eom@calebfoundry·
Brief history of harness engineering In case the buzzwords around what harness even is and why we're even talking about harness in general, here's a video explaining why the industry evolved from prompt, to context, and now to harness engineering. Enjoy.
English
1
0
16
395
Caleb Eom
Caleb Eom@calebfoundry·
@deepseek_ai Closest neocloud pricing is 3X more expensive. Feels bad for inference providers having to compete with subsidized plan from DeepSeek on this.
English
0
0
0
1.7K
Chubby♨️
Chubby♨️@kimmonismus·
I’m flying back to Germany now, carrying with me so much optimism and a real sense of momentum from San Francisco and the U.S. There is something incredibly energizing about being here, surrounded by people who genuinely believe the future can be built, improved, and accelerated. I hope I can bring some of that optimism back home to Germany. Thanks, everyone! @itsolelehmann , @itsPaulAi , @Futurenvesting , Fawzi and so much more amazing people I’ve met! Thank you!
Chubby♨️ tweet mediaChubby♨️ tweet mediaChubby♨️ tweet mediaChubby♨️ tweet media
English
24
5
283
18.1K
Caleb Eom
Caleb Eom@calebfoundry·
Great chat with @Sam_Witteveen last night. Sam is the 🐐 with insane insights into the AI industry.
Caleb Eom tweet media
English
2
0
11
629
Caleb Eom
Caleb Eom@calebfoundry·
Assuming YC team of 2-3 people, you're trading around 3 years of running Codex (~70TPS at $14/M tokens running 24/7) for equity. I wouldn't take the deal.
Tyler Bosmeny@bosmeny

A mic drop moment @ycombinator tonight @sama just offered $2M in OpenAI tokens to EVERY YC startup in the current batch in exchange for equity Just like Yuri Milner offering to invest in every startup back when Sam was a YC partner I can't wait to see what's unlocked when you let the most driven, creative and formidable founders tokenmaxx

English
1
0
5
1.8K