li

1.9K posts

li

li

@geraldlee193

Katılım Nisan 2025
30 Takip Edilen26 Takipçiler
li
li@geraldlee193·
@vivilinsv 谁先乱的规矩?
中文
0
0
0
21
Vivi
Vivi@vivilinsv·
Manus事件真正让人不寒而栗的,不只是交易被撤销。而是它事实上否定了过去十多年中国科技创业默认的一整套退出逻辑。 过去大家默认的规则是: 中国做研发, 中国搭团队, 中国验证市场, 然后通过离岸架构、全球融资、国际化运营, 最终走向全球资本市场或海外并购退出。 这套模式支撑了无数中国科技公司的崛起。 而现在,Manus事件传递出的新信号却是:你不能在享受中国红利之后,再自由决定未来归属。 甚至有“聪明人”出来点拨:“如果你不想成为中国技术体系的一部分,那你从第一天就应该离开。一开始就去美国/新加坡/欧洲。创业者必须想清楚自己未来属于谁。” 这话听起来很有逻辑。但细想之下,其实非常荒谬。 因为创业从来不是一条预设好的直线。 市场会变。融资环境会变。团队会变。监管会变。战略当然也会变。 Pivot(转型)本就是创业的常态,而不是例外。 如果连根据现实调整方向、重组架构、改变战略路径的基本自由都没有 - 那创业还有什么意义? 这就像谈恋爱时要求:“你第一天就必须想清楚要不要结婚,否则别谈。” 荒不荒谬? 当然 - Manus拿了地方资源,后来裁掉团队、另起炉灶, 你可以说:不厚道。不体面。甚至不道德。 但不道德,不等于违法。 难道因为前任支持过你、给过你资源、陪你成长 - 你就永远不能分手,不能开始新关系? 最可怕的不是交易黄了。而是它释放出的制度信号:你并不真正拥有你创造的东西。 你拥有的,只是“在系统允许范围内的暂时使用权”。 当你的选择不再符合更高层意志时 - 规则可以被重新解释。边界可以被重新划线。过去默认允许的路径,也可以突然被否定。 这对创业环境的伤害远超 Manus & Meta deal本身。 因为它打击的不是一家公司的退出。而是所有在中国的创业者对以下三件事的信心:产权边界是否清晰?规则是否稳定可预期?战略调整是否仍被允许? 规则越模糊,惩罚越任意,创新就越难真正繁荣。 因为创业者最怕的不是竞争。而是跑到半路,发现终点线被人改了。
Vivi@vivilinsv

The Manus situation is bigger than one deal. It signals a chilling message for founders - especially Chinese entrepreneurs (still in China): Some “wise men” commented on this deal and said “Choose your destiny on Day 1 — and never change your mind.” What they meant is - if you choose to start your company in China, then stick to it. If you’ve decided to go overseas, start a foreign company from day 1. That sounds reasonable. But it is fundamentally incompatible with how startups actually work. Startups pivot. Markets change. Regulations evolve. Founders adapt. Telling entrepreneurs they must decide at incorporation exactly where the company will end up — and then punishing them for changing strategy later — is absurd. That’s like saying: “If you date someone, you must know from Day 1 whether you’ll marry them. Otherwise don’t date at all.” Yes, Manus took local support, laid off the original team, and restarted elsewhere, people can debate whether that was ethical. But ethics and legality are not the same thing. If founders are no longer allowed to restructure, relocate, pivot, or rebuild without political consequences, then the message is clear: You don’t truly own what you build. That is bad for entrepreneurship. Bad for innovation. Bad for long-term trust in the startup ecosystem. The game has changed — and founders everywhere should pay attention.

中文
406
121
703
501.5K
li
li@geraldlee193·
@marcushkheroes 这个广告太夸张 现在还有人买日本车?
中文
0
0
0
38
Marcus huang
Marcus huang@marcushkheroes·
丰田有的订单已经排到五年后了,大部分是老中给的。 身体很诚实
hamammm@hamammm4

@marcushkheroes 丰田是产能跟不上需求,在日本一半丰田的车型都停止订货,接受订货的也要等半年以上。

中文
65
4
108
71.8K
li
li@geraldlee193·
@denisewu NVDA’s chip are not belong to NVDA but belong to the White House
English
0
0
2
80
Denise Wu
Denise Wu@denisewu·
I feel sorry for Chinese engineers who realize their intellectual property belongs to the state, not to them, after the Manus order. 🥲
English
169
36
395
23.8K
li retweetledi
Tom Yeh
Tom Yeh@ProfTomYeh·
Single vs Multi-hand Attention by hand ✍️ Resize matrices yourself 👉 byhand.ai/qNmYKw The most important fact about multi-head attention: it has the same parameter count as single-head attention. The difference is purely structural — same total Wqkv weights, partitioned into smaller q–k–v triples. Look at the two diagrams below. Both Wqkv matrices have the same height — same number of weight rows, same number of parameters. What changes is how that single tall block is sliced. • Left. One head. The full Wqkv produces one big QKV: a tall Q (36 rows), a tall K, a tall V. One scoring computation runs over those full-width tensors. • Right. 3 heads. The same-height Wqkv is sliced into 3 smaller q–k–v triples — each 12 rows tall. 3 scoring computations run in parallel, each a thinner version of the left. The compute trade-off — kind of. Same Wqkv weights. Multi-head runs the attention scoring S = Kᵀ × Q once per head, so the dot-product count multiplies by H. • Single-head: seq × seq = 40² = 1600 dot products • Multi-head: seq × seq × H = 40² × 3 = 4800 dot products (3×) But each multi-head dot product is narrower — its inner dimension is head_dim instead of H × head_dim. So when you count actual scalar multiplications, the totals are equal: • Single-head: seq² × (H × head_dim) = 40² × 36 = 57600 • Multi-head: seq² × H × head_dim = 40² × 3 × 12 = 57600 Same FLOPs. Multi-head buys you H independent attention patterns at no extra weight cost and no extra arithmetic cost — it's the same total compute, sliced into H finer-grained heads.
English
14
95
556
34.6K
li
li@geraldlee193·
@zmx8067 可能都不知道发射了😂
中文
0
0
4
1.8K
油爆琵琶拌着面🇨🇳
根據《亞洲防務與安全》網站報道,🇨🇳中國海軍055型驅逐艦在菲律賓附近海域貼臉開大,發射了YJ-20高超音速導彈。 🇺🇸🇯🇵🇦🇺聯合軍演才10余艘老舊軍艦,我都不知道到底誰在展示肌肉?🤣
油爆琵琶拌着面🇨🇳 tweet media油爆琵琶拌着面🇨🇳 tweet media
中文
14
8
156
31.2K
David LaLonde
David LaLonde@autobypayment·
@loong_of No oil for China. It takes oil to run the fleet. XI is scared, and he should be.
English
5
0
0
415
Loong of the east
Loong of the east@loong_of·
Chinese Fleet Deters Provocations by the U.S., Japan, and the Philippines As the militaries of the United States, Japan, and the Philippines assembled for their largest annual joint exercise—"Balikatan"—aimed at countering China, China dispatched a powerful mixed fleet to the immediate vicinity to conduct surveillance and military drills. Imagery from Sentinel-2 reconnaissance satellites recently revealed the presence of a large Chinese naval combat formation in the South China Sea, comprising both an aircraft carrier strike group and an amphibious readiness group. Reportedly, the vessels visible in the satellite imagery alone included one aircraft carrier, one amphibious assault ship, one large destroyer, two standard destroyers, and six frigates; additionally, a number of auxiliary combat vessels and underwater warfare assets were captured in the images. This marks the largest concentration of Chinese naval forces in the South China Sea observed so far this year.
Loong of the east tweet mediaLoong of the east tweet media
English
3
20
114
5.9K
li retweetledi
Eason Mao☢
Eason Mao☢@KELMAND1·
055当着美菲日面在菲律宾附近打了发YJ-20高超 据亚洲防务安全网站(DEFENCE SECURITY ASIA)2026年4月25日报道:一艘055型驱逐舰在菲律宾附近海域与辽宁号航空母舰同时进行演习时,发射了另一枚YJ-20高超音速反舰导弹。 这绝不仅仅是一次例行海军演习,因为它将中国最先进的海基反介入武器系统直接置于正在进行的“肩并肩2026”联合军演的战略格局之中。 战略信息将十分明确:未来南海的任何突发事件将不再仅仅由礁石和浅滩周围的领土争端决定,而是由能够以航母打击群、远征部队和盟军海军后勤节点为目标的远程高超音速海上打击系统构成的可信威胁决定。 defencesecurityasia.com/en/china-type-…
Eason Mao☢ tweet media
Eason Mao☢@KELMAND1

综合央视、中国军号等消息,近日南部战区组织2艘驱逐舰,1艘护卫舰,1艘补给舰组成编队赴吕宋岛以东海域举行实弹演练,强势回应美日菲等国搞的“肩并肩”军演。 055大驱遵义舰(107)12500多吨 052D合肥舰(174)7000多吨 054A咸宁舰*(500)4000多吨 903A补给舰*骆马湖舰(907)23000多吨 总吨位46000多吨、垂发总数208单元。 近期,中国人民解放军南部战区组织107编队在菲律宾吕宋岛以东海域开展演训活动,重点进行实弹射击、海空协同、快速机动、航行补给等演练,检验一体化联合作战能力。

中文
37
25
304
91.6K
li retweetledi
田字格
田字格@tianzige4·
肩并肩变成了搂搂抱抱😂😂😂中国军舰硬核围观美菲七国肩并肩军演
中文
103
65
605
165.4K
晴清青
晴清青@UFesenbek70153·
@geraldlee193 @mudiaoLOL @daneishe888 我上面说的这条外网看不懂也多了去了,你的所谓不封效果就这?😅再说了当前但凡想翻的谁找不到路径? 还是说你觉得保健品户圣辛傻B都能忽悠的主来看推的这帮傻B能不产生点火花?傻子共振造成的影响你负责吗?
中文
1
0
0
50
she888🇯🇵🤝🇨🇳日中友好!
中国共産党は駄目だ!と理想に燃える学生もネットで海外を知れば、やっぱ共産党でいいや… (´・ω・`) となるのは間違いないですね。中国では選挙に勝つために政治家がデマを拡散する事も無いし、主食の高騰が市場原理の一言で放置されたりもしない。議員が「女性は子供を産む機械」とか言わない🤔
小雪妹妹@xiangsaixue520

@zi68795486 出来才知道共产党是真的好,其他党派和国家是个啥样

日本語
53
30
347
57.1K
li retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
April was a pretty strong month for LLM releases: - Gemma 4 - GLM-5.1 - Qwen3.6 - Kimi K2.6 - DeepSeek V4 All are now added to the LLM Architecture Gallery. More details once I am fully back in May!
Sebastian Raschka tweet media
English
76
440
3K
124.1K
li retweetledi
Akshay 🚀
Akshay 🚀@akshay_pachaar·
CPU vs GPU vs TPU vs NPU vs LPU, explained visually: 5 hardware architectures power AI today. Each one makes a fundamentally different tradeoff between flexibility, parallelism, and memory access. > CPU It is built for general-purpose computing. A few powerful cores handle complex logic, branching, and system-level tasks. It has deep cache hierarchies and off-chip main memory (DRAM). It's great for operating systems, databases, and decision-heavy code, but not that great for repetitive math like matrix multiplications. > GPU Instead of a few powerful cores, GPUs spread work across thousands of smaller cores that all execute the same instruction on different data. This is why GPUs dominate AI training. The parallelism maps directly to the kind of math neural networks need. > TPU They go one step further with specialization. The core compute unit is a grid of multiply-accumulate (MAC) units where data flows through in a wave pattern. Weights enter from one side, activations from the other, and partial results propagate without going back to memory each time. The entire execution is compiler-controlled, not hardware-scheduled. Google designed TPUs specifically for neural network workloads. > NPU This is an edge-optimized variant. The architecture is built around a Neural Compute Engine packed with MAC arrays and on-chip SRAM, but instead of high-bandwidth memory (HBM), NPUs use low-power system memory. The design goal is to run inference at single-digit watt power budgets, like smartphones, wearables, and IoT devices. Apple Neural Engine and Intel's NPU follow this pattern. > LPU (Language Processing Unit) This is the newest entrant, by Groq. The architecture removes off-chip memory from the critical path entirely. All weight storage lives in on-chip SRAM. Execution is fully deterministic and compiler-scheduled, which means zero cache misses and zero runtime scheduling overhead. The tradeoff is that it provides limited memory per chip, which means you need hundreds of chips linked together to serve a single large model. But the latency advantage is real. AI compute has evolved from general-purpose flexibility (CPU) to extreme specialization (LPU). Each step trades some level of generality for efficiency. The visual below maps the internal architecture of all five side by side. 👉 Over to you: Which of these 5 have you actually worked with or deployed on?
GIF
English
70
872
3.2K
240.8K
li retweetledi
Nainsi Dwivedi
Nainsi Dwivedi@NainsiDwiv50980·
Nvidia trained a billion-parameter LLM without a single gradient, without backprop, without fp32 weights anywhere. And it is 100x faster. For the last decade, every major AI model has been trained the exact same way. Backpropagation. It requires massive, expensive GPUs. It requires complex floating-point math. It requires a massive memory footprint just to calculate the gradients. It’s the reason why only mega-corporations can afford to train foundation models. Until today. Nvidia and Oxford published a paper called "Evolution Strategies at the Hyperscale." They completely bypassed backpropagation. Instead of calculating gradients, they use Evolution Strategies (ES), a method that randomly mutates the AI's parameters, sees what works best, and literally evolves the model. In the past, this was way too computationally expensive for billion-parameter models. But they fixed it by inventing EGGROLL (Evolution Guided General Optimisation via Low-rank Learning). By compressing the mutations into low-rank matrices, they achieved a 100x increase in training speed for large models. But that isn't the craziest part. Because it doesn't use backpropagation, it doesn't need high-precision math. They successfully trained a massive language model entirely on pure integer datatypes. Raw, basic, low-level math. This completely rewrites the economics of open-source AI. If you can train models directly on the cheap, fast integer datatypes they use for inference, the hardware requirements collapse. You don't need a multi-million dollar cluster of high-end GPUs just to do the math anymore.
Nainsi Dwivedi tweet media
English
10
50
212
13.6K
li retweetledi
Phoenix Yin
Phoenix Yin@Phoenixyin13·
🚨 震撼!2026年4月Science重磅:斯坦福团队(中国博士后一作)发现蛋白质能直接指导合成DNA。 近70年的中心法则被彻底改写了! 以前的情况是,DNA→RNA→蛋白质(信息单向) 现在,斯坦福团队发现,蛋白质也能反向→DNA!完全不需要DNA/RNA模板,就靠蛋白质自己的氨基酸结构当模具,精准造出AC重复DNA序列。 这是克里克提出中心法则以来最大改写!蛋白质第一次被证实能直接携带并传递遗传信息。 细菌用它对抗病毒,我们人类可能即将迎来基因工程新纪元。 最令我感到喜悦的两件事: 1.论文里提到用 AlphaFold3 辅助建模Drt3b的结构,这是生物学研究的常规操作,但我仍然感受到AI与生物的有机结合。 2. 蛋白质直接当模板造DNA的机制,打开了蛋白质指导遗传信息的新玩法。如果未来能工程化改造Drt3b,让它按照AI设计的蛋白结构,精准合成任意DNA序列,那基因合成、基因编辑、合成生物学可能会迎来大跃进,不再完全依赖传统模板依赖的聚合酶,这将是伟大的未来。 总之,2026年的4月,人类创造了一场生命科学基础理论的震撼。请大家记住此刻,记住这神奇而美妙的四月。 #中心法则 #蛋白质DNA #Science #生物学突破 #斯坦福
Phoenix Yin tweet mediaPhoenix Yin tweet media
中文
242
219
1K
254.6K
Andy O
Andy O@andyz8818576155·
字节跳动前员工爆料中国头部AI公司内幕,和我的看法基本一致: 1)他认为中美 AI 差距在扩大不是缩小。 2)蒸馏走捷径很普遍。 3)训练用的都是英伟达的卡。 4)国产的AI Agent完全不实用。
思维怪怪@0xLogicrw

张驰,浙大本科,UCLA 博士(导师朱松纯),2025 年加入字节 Seed 做数学推理,干了一年后离职去北大当助理教授。最近上播客 Into Asia 回顾了在字节的一年,不少判断跟目前的公开叙事直接冲突。 他的核心观点: 1. 字节跑完一轮完整迭代(预训练+后训练)要大约半年,谷歌据传三个月。他认为迭代速度差距是追不上的根本原因。 2. Seed 内部 benchmaxxing 严重。领导按 benchmark 分数考核,大家都在刷榜,但他说纸面追平了不等于真的好用,「实际体验不行」。 3. 2024 年底 Seed 自认追平了 GPT-4o,结果 DeepSeek 一出来才发现差距还在。他入职时全组紧急转强化学习。 4. 他认为中美 AI 差距在扩大不是缩小。原话:「我甚至不同意中国在追赶这个说法,我们仍然远远落后。」同事和学生同意,但智谱、MiniMax 这些上市公司的领导层不会同意。 5. 蒸馏走捷径很普遍。很多公司直接调 Claude/GPT/Gemini 的输出当训练数据。不过他也承认 DeepSeek 在 V3/R1 上有真正的架构创新。 6. 字节主力芯片是 NVIDIA H20,最快的卡留给预训练和后训练团队。国产芯片有但没人用于训练。字节在海外采购新一代 NVIDIA 芯片,但「肯定不在中国大陆」。 7. 美国公司有用户反馈飞轮,模型好用→用户多→反馈好→模型更好。中国模型起步差,没人愿意用在重要工作上,数据拿不到,恶性循环。 8. 他在谷歌实习时觉得基础设施「太好了」,跟字节差距巨大。不只是芯片,训练框架和整个基础设施都差一截。 9. 中国 AI 从业者普遍用美国 Agent 工具。他自己用 Claude Code 和 Copilot,中国模型的编码 Agent 他评价「完全不实用」。字节海外团队直接用 Cursor。 10. Claude Code 好用到让他在想还要不要培养博士生,但又怕不培养下一代,以后没人做研究。 背景补充:张驰在字节只待了约一年,所在的数学组他自己说偏宣传性质,不在核心的预训练/后训练团队。他的观点是个人视角,不代表字节全貌。

中文
80
99
796
520.8K
li
li@geraldlee193·
@TommyLindada @wngbg @daneishe888 我觉得还是中国好 谢谢! 你适合现在的美帝,Trump非常喜欢限制别人自由,不允许别人说他不好的人。你可以给他保驾护航。
中文
1
0
0
40
li
li@geraldlee193·
@TommyLindada @wngbg @daneishe888 你的价值观适合生活在朝鲜 那里适合你 没有网络 没有黄色 不用担心小孩受影响😄
中文
1
0
0
36
li
li@geraldlee193·
@TommyLindada @wngbg @daneishe888 有,但是没有展示决心和实力,只有拳头打在身上才起作用,嘴上谴责,警告都是无效的,没有PLA作后盾,没有港警重拳出击,只靠断网就解决了😂
中文
0
0
0
45