cloud

4.1K posts

cloud banner
cloud

cloud

@cloud11665

Robotics + Sora @openai

SF Katılım Temmuz 2017
2.1K Takip Edilen15.3K Takipçiler
Sabitlenmiş Tweet
cloud
cloud@cloud11665·
Life update: moved to the USA on an O-1 visa 🇺🇸 I’m happy to announce I’ll be joining @OpenAI to help build the American Empire. I’ll be working on pioneering the future of world models with Sora through infrastructure development and optimization. While this country has a constantly evolving regulatory/immigration environment, I’m grateful that the doors are still open for high skilled labor. I am immensely grateful to @rauchg, @drewhouston, @stephenbalaban, @evanjconrad, @model_mechanic and @gabriel1 for backing me.
cloud tweet media
English
352
74
5.4K
972.5K
tender
tender@tenderizzation·
>linear memory overhead with model depth >let’s solve this by introducing dynamism over which depth entries are needed >top-k on depth entries
tender tweet media
English
6
3
188
8.5K
cloud retweetledi
LaurieWired
LaurieWired@lauriewired·
I bet you’ve never seen this connector. This, is a HyperTransport (HTX) slot. Arguably ahead of its time, it had some amazing latency advantages over PCI-E. Cray (and the broader HPC world) did some really interesting things with it.
LaurieWired tweet mediaLaurieWired tweet media
English
29
47
1K
56K
adi
adi@adonis_singh·
massive thanks to @theo for giving me the creds to make this run happen! the total run costed ~$210, took ~11 hours, and cost per question was ~$2.2. Not great price to perf but we dont really care about that when using the pro models anyways lol (models with * are no reasoning variants, there might be mistakes with this though, didnt look too much into it)
adi tweet mediaadi tweet media
English
2
0
52
2.3K
Air Katakana
Air Katakana@airkatakana·
why does codex have 3 "latest" models? why is there a gpt-5.4 but no gpt-5.4 codex? am i supposed to use regular gpt-5.4 in codex? is gpt-5.4 actually gpt-5.4-codex but they just named it differently? it says it's an agentic coding model please help @OpenAIDevs
Air Katakana tweet media
English
54
2
541
101.5K
prinz
prinz@deredleritt3r·
@cloud11665 @nitbean Embrace the chaos instead of rejecting it. I am fully prepared for AGI to be named GPT-7.1-Pro-Codex-Max (xhigh)
English
1
0
2
89
prinz
prinz@deredleritt3r·
They're benchmaxxxing on prinzbench
prinz tweet mediaprinz tweet media
English
2
3
60
4.3K
prinz
prinz@deredleritt3r·
@nitbean Insane model naming schemes are a core component of OpenAI culture
English
1
0
5
311
cloud retweetledi
Sam Altman
Sam Altman@sama·
GPT-5.4 is launching, available now in the API and Codex and rolling out over the course of the day in ChatGPT. It's much better at knowledge work and web search, and it has native computer use capabilities. You can steer it mid-response, and it supports 1m tokens of context.
Sam Altman tweet media
English
2K
1.2K
12.9K
1.3M
cloud
cloud@cloud11665·
@JasonBotterill No magic, just hard work. We have the best inference team in the world
English
1
0
60
1.5K
cloud retweetledi
Daniel Lemire
Daniel Lemire@lemire·
Years ago, we wrote a C++ library which implements float parsing (std::from_chars). That is, you go from the string "3.1416" to a number (e.g., of type 'double'). I started this work after realizing that in many cases, float parsing was the bottleneck when parsing number-heavy JSON documents (e.g., geojson files). Our code uses an algorithm that is 4 times faster than old float parsing functions in important cases. There are ports in Java, C#, Rust... BUT up until now, we did not have a straight C implementation (to my knowledge). This is annoying for a project like Redis that uses our C++ code because they need to have a C++ compiler. @antirez initiated a C port with the help of AI for this reason. But I think we have something better. Koleman Nix did a full (hand coded) port to C. I threw exhaustive tests at it and it passes! It is also incredibly fast. And it is just plain C. It is not officially released yet, but you can check it out at github.com/kolemannix/ffc… In some benchmarks, the new C port is the fastest float parser !!! 100 million floats per second, corresponding to 2 GB/s!!! That's not as fast your fast disk, but it is getting there!
Daniel Lemire tweet media
ptr noalias nonnull %koleman@kolemannix

I'm working to get my library, `ffc`, ready for an initial release. As far as I know it is the fastest string to float parser in the world. It's a direct port of @lemire's wonderful fast_float library, but in pure c99 instead of C++. Check it out if you need to parse floats or just like C! github.com/kolemannix/ffc…

English
21
58
1.2K
86.3K
cloud
cloud@cloud11665·
@sahuang97 What exactly was the regression?
English
0
0
0
223
sahuang
sahuang@sahuang97·
@cloud11665 i'm just hoping there's no regression which i saw on 5.3-codex in some tasks compared with 5.2🙏
English
1
0
2
594
cloud
cloud@cloud11665·
@Lazin @prajdabre I have another project where I maxed it out using io_uring and O_DIRECT
English
0
0
0
44
Raj Dabre
Raj Dabre@prajdabre·
Technical interview question: Suppose you have 5 TB worth of text data and you want to count the total number of words, how will you do this?
English
486
52
2.1K
2.1M
cloud
cloud@cloud11665·
@Lazin @prajdabre These numbers were taken on my 4x 4TiB gen4 nvme raid0 cluster + nowadays a single gen6 drive can do like 30GiB/s
English
1
0
0
31
cloud retweetledi
karminski-牙医
karminski-牙医@karminski3·
Apple ANE 被成功逆向! 38TOPS 算力其实是数字游戏? 刚刷到博主 maderix 开源了个硬核项目: 逆向 Apple 的私有 API, 绕过 CoreML, 直接在 Apple Neural Engine (ANE) 上实现了神经网络训练! 等会? 啥是 ANE? ANE是苹果芯片内部的神经网络加速单元, M4 上目前已经是 16 核的运算单元了, 官方宣称性能有 38 TOPS. 但一直是黑盒, 你只能通过 CoreML 框架去调用, 没有任何公开接口, 没文档, 没 ISA, 啥都没. 于是这哥们把 CoreML 这层壳给扒了. 用了一些逆向手段(比如 dyld_info 扫描, method swizzling 拦 CoreML 等), 最终逆向出了完整的编译运行流程. 最关键的是, 他还搞通了内存编译路径, 可以直接在内存里把 MIL (类似 NVIDIA 的 PTX) 编译成 ANE 二进制. (方便用ANE训练大模型了) 然后逆向的过程中发现了很多爆炸性信息: 首先ANE本质上是个卷积引擎, 不是矩阵乘法引擎. 同样的计算, 改写成卷积运算吞吐量直接翻 3 倍! (Apple 自己的 ml-ane-transformers 参考实现里就暗示了这个模式, 但从来没有明说) 第二, ANE 内部有大约 32MB 的 SRAM. (做矩阵乘法 scaling 测试发现的性能断崖推测出来了) 第三, 单个算子只能用到 ANE 峰值性能的约 30%. 因为 ANE 的 16 个核是流水线式的, 你只提交一个操作, 大部分核在空转. 得把 16-64 个操作链在一张计算图里一次性提交, 不同的核同时处理图里不同阶段的操作, 利用率才能拉满到 94%. 最后也是最炸裂的发现: "38 TOPS" 是个数字游戏. 作者用 FP16 和 INT8 跑了完全相同的操作, 吞吐量一样. 结论是 ANE 在执行 INT8 时会先反量化到 FP16 再计算, 苹果的 "38 TOPS INT8" 就是拿 19 TFLOPS FP16 乘了个2的数字游戏. 真实峰值就是 19 TFLOPS FP16. 另外个细节: ANE 有硬件级电源门控, 空闲时功耗真的是 0mW, 不是低功耗待机, 是真的完全断电零泄漏, 这电源管理真的牛X. 移动端嗷嗷友好. 当然最主要的其实还是这个过程很有学习价值, 两个blog信息量超大, 我这里写不下, 建议感兴趣的同学直接读原文 inside-the-m4-apple-neural-engine, 我这只能抛砖引玉: 项目地址: github.com/maderix/ANE 博客 Part 1 (逆向工程): maderix.substack.com/p/inside-the-m… 博客 Part 2 (基准测试): maderix.substack.com/p/inside-the-m… #ANE #CoreML #AppleSilicon #NPU训练 #KCORES
karminski-牙医 tweet media
中文
34
180
1.1K
106K