simingg yyan

769 posts

simingg yyan banner
simingg yyan

simingg yyan

@Samoyansiming

quantitative research. Future and option in China and Europe

shanghai, china Entrou em Mart 2023
498 Seguindo30 Seguidores
simingg yyan retweetou
Firecrawl
Firecrawl@firecrawl·
Starting today, you can try Firecrawl for free without an API key 🔥 Search, scrape, and interact with any web page, plus parse any PDF into clean markdown, with no setup at all! Start using our endpoints and only sign up when you scale. Live on our MCP, CLI, and API now!
English
71
201
1.7K
640.6K
simingg yyan retweetou
小互
小互@xiaohu·
OpenAI 格局大了 宣布Codex (包含 App 客户端、命令行 CLI 和开发包 SDK) 支持直接接入任何开源大模型 不强制绑定 OpenAI 自家的模型 并且放出了一个文档:手把手教开发者如何把 Codex 客户端底层的“大脑”,替换成免费的开源模型…
Tibo@thsottiaux

Reminder that you can use the Codex App, CLI and SDK with any open source model, not just with OpenAI models. #oss-mode-local-providers" target="_blank" rel="nofollow noopener">developers.openai.com/codex/config-a…

中文
108
147
1.2K
242.4K
simingg yyan retweetou
Noam Shazeer
Noam Shazeer@NoamShazeer·
I’m excited to share that I’ll be joining OpenAI and look forward to working with the exceptional team there. It was a difficult decision to move on. I’m incredibly proud of the amazing team at Google and everything we’ve built together. It has been an honor and a pleasure to work with all of you.
English
580
450
8.3K
3.1M
simingg yyan retweetou
Kafka
Kafka@kfk_ai·
徐亦达教授(Prof Richard Xu),香港浸会大学数学系教授,“TadReamk Limited”创始人 他在 GitHub 上只有一个真正爆款项目:`machine-learning-notes`(9663 stars,1767 forks),一部 2000+ 页的机器学习、概率模型和深度学习的幻灯片合集,附带视频链接 除此之外的 13 个仓库,大多是无描述、无更新、无代码的“三无产品”。他拥有 5347 个 followers,但关注 0 人——绝对的知识输出者,社交黑洞
Kafka tweet media
中文
20
126
607
54.8K
simingg yyan retweetou
Z.ai
Z.ai@Zai_org·
Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: z.ai/blog/glm-5.2 Weights: huggingface.co/zai-org/GLM-5.2 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Chat: chat.z.ai
Z.ai tweet media
English
507
1.3K
9.5K
4.1M
simingg yyan retweetou
Przemek Chojecki | PC
Przemek Chojecki | PC@prz_chojecki·
Kimi 2.7 ranked 2nd after Fable 5 and before GPT-5 xhigh We have re-run our ErdosBench smoke test on 14 problems with Kimi 2.7, Qwen 3.7 Max, Grok 4.3 and compared it with the top performers from previous runs. Kimi 2.7 is amazingly good. More below.
Przemek Chojecki | PC tweet media
English
169
554
5.1K
1.8M
simingg yyan retweetou
jietang
jietang@jietang·
GLM-5.2 is Fully Open, Frontier Intelligence Belongs to Everyone Today, the sudden restriction of certain frontier models is deeply regrettable. At a time when access to frontier models is abruptly cut off for non-technical reasons, we are even more convinced of one thing: science should be global. The path to AGI (Artificial General Intelligence) must never be enclosed by high walls. We have always believed that AGI should be the cornerstone for all of humanity to collaboratively explore the boundaries of intelligence and solve complex challenges, rather than a privilege monopolized by a few rules and subject to revocation at any moment. In the face of external blockades and restrictions, our attitude is one of radical openness. Frontier intelligence must remain open-source, accessible, and buildable, serving every dedicated developer. GLM-5.2 is Zhipu's most capable open-source model to date. It not only supports a truly usable 1M context window but also maintains a continuous lead in the independent completion of long-horizon tasks, providing solid foundational support for building complex agent applications. It also continues to be our main engine for creating the strongest domestic coding model. Tonight at 5:21—at this special moment—GLM-5.2 will officially be available to all GLM Coding Plan users (including Lite / Pro / Max). The API will also go live next week. A step closer to frontier intelligence for everyone. The future of AI is open, and it is for the people. ModelKey: GLM-5.2
English
260
771
7.5K
940.2K
simingg yyan retweetou
Santi Torres
Santi Torres@SantiTorAI·
Un desarrollador ucraniano creó un agujero negro en su terminal para obligarse a tomar descansos. Cuanto más trabajas sin parar, más crece y deforma tu código con su lente gravitacional. Descansas y se encoge.
Español
270
2.9K
32.9K
4.2M
simingg yyan retweetou
Megumin_SAMA
Megumin_SAMA@Meguminsama2009·
国产应用进入大淫纹时代
Megumin_SAMA tweet mediaMegumin_SAMA tweet media
中文
819
530
11.4K
6.6M
AI最严厉的父亲
AI最严厉的父亲@dashen_wang·
埃隆马斯克,你给我出来!为什么CodeX没有Linux的桌面版?
中文
97
0
56
39.5K
Sizhe思哲
Sizhe思哲@Sizhe_bitcat·
豪门对衣着的要求有多高,哪怕大夏天30多度的高温,一样穿着西装衬衣。 听说他们的这种西装很凉快,里面都是冰丝的🙉
中文
61
2
59
67K
琵琶牧々
琵琶牧々@anblk984·
@goshi_aoki 这场ai竞赛我是真看空中国的,中国这些企业都是在抢客户抢人,并不在乎客户质量所以都在发红包压价格炒作爱国 但是美国的企业是抢投资和能源,所以着力点在挖护城河和技术研发上
中文
26
2
78
37.7K
Goshi Aoki
Goshi Aoki@goshi_aoki·
中国の浙江大学(Zhejiang University)のComputer Scienceの修士課程を1年前に修了しましたが、中国人のトップAI人材との対話や大学院に在籍する中で知った、中国のAI人材育成システムについて、まとめてみました↓
日本語
45
155
1.1K
798.2K
simingg yyan retweetou
恒星
恒星@vintcessun·
为什么Muon训练大模型比Adam快近两倍,却没人说清原因?这个困局直接影响下一代优化器设计。这篇论文从曲率切入,把损失下降拆成一阶增益和二阶曲率惩罚,惊讶发现两者步长相当,但Muon的归一化方向锐度NDS显著更低——不是步大,是方向更聪明。尤其数据不平衡会放大这个优势,中后期训练核心来自更小的层内曲率。理论加实验,终于把玄学变成几何直觉。 arxiv.org/abs/2606.04662
中文
0
41
297
15.1K
simingg yyan retweetou
南宫远
南宫远@nangongyuan·
审美真的是很奇怪的东西。我承认三上悠亚还可以。 但是比起葵司来,根本不能比。
中文
360
12
552
323.9K
simingg yyan retweetou
Mathieu
Mathieu@miniapeur·
Mathieu tweet media
ZXX
73
3.8K
27.6K
415.1K
simingg yyan retweetou
Anna 🇺🇸
Anna 🇺🇸@realAnn_29·
Big dog parents know this struggle TOO well. 😂😂
English
152
1.8K
17.8K
1.7M
simingg yyan retweetou
Cameron R. Wolfe, Ph.D.
Cameron R. Wolfe, Ph.D.@cwolferesearch·
Interested in learning how to run RL at scale? Here are the best resources to read… Research on Scaling RL 1. The Art of Scaling RL compute for LLMs: arxiv.org/abs/2510.13786 2. Scaling Behaviors of LLM RL Post-Training: arxiv.org/abs/2509.25300 3. Optimally Scaling Sampling Compute for LLM RL: arxiv.org/abs/2603.12151 4. Scaling up RL: arxiv.org/abs/2507.12507 5. ProRL V2 - Prolonged Training Validates RL Scaling Laws: hijkzzz.notion.site/prorl-v2 6. Polaris - A Recipe for Scaling RL with Reasoning Models: hkunlp.github.io/blog/2025/Pola… RL Frameworks 1. Hybrid Flow (early outline of the verl framework): arxiv.org/abs/2409.19256 a. More up-to-date info can be found here: arxiv.org/abs/2601.18150 2. AReal - Large-Scale Async RL: arxiv.org/abs/2505.24298 3. PipelineRL - Fast On-Policy RL: arxiv.org/abs/2509.19128 4. AsyncFlow - Async Streaming RL: arxiv.org/abs/2507.01663 RL for Agents 1. DeepSWE - Open Coding Agent Trained w/ RL: together.ai/blog/deepswe 2. AutoForge - Environment Synthesis for Agentic RL: arxiv.org/abs/2512.22857 3. Agent-R1 - Training Agents w/ End-to-End RL: arxiv.org/abs/2511.14460 4. AgentRL - Scaling RL for Multi-Turn, Multi-Task Agents: arxiv.org/abs/2510.04206 5. The Landscape of Agentic RL: arxiv.org/abs/2509.02547 6. Training SWE Agents with RL: arxiv.org/abs/2508.03501 Case Studies & Tech Reports 1. Kimi tech reports: a. Kimi K2 - Open Agentic Intelligence: arxiv.org/abs/2507.20534 b. Kimi End-to-end Agentic RL: moonshotai.github.io/Kimi-Researche… c. Kimi K1.5 - Scaling RL for LLMs: arxiv.org/abs/2501.12599 2. Composer series from Cursor: a. Composer 2: arxiv.org/abs/2603.24477 b. Composer 2.5: cursor.com/blog/composer-… 3. Olmo 3 (also has open code / data): arxiv.org/abs/2512.13961 4. MiniMax tech reports: a. MiniMax-M2: arxiv.org/abs/2605.26494 b. MiniMax-M1: arxiv.org/abs/2506.13585 5. Nemotron 3 (NVIDIA): arxiv.org/abs/2512.20856
Cameron R. Wolfe, Ph.D. tweet media
English
18
136
802
34.5K
simingg yyan retweetou
Muyu He
Muyu He@HeMuyu0327·
I am a big fan of Jianlin Su's blog because it always starts from first principles in mathematics, rather than "ML tricks", to approach a typical ML problem (eg. training-free MoE load balancing). Here is me trying to "reinvent" one such blog which provides an elegant alternative to compute Muon, by filling in all the derivations that the blog skips for a less math-savvy audience (besides being entirely in Mandarin). The goal of the blog is to find a way to compute a essential component of Muon, ie. the left and right singular value matrices U and V for the gradient G, **individually**. In the standard form, Muon really just needs their product UV^T, hence the standard way to compute it via computing a low-rank polynomial of G many times ("Newton-Schulz"). But there are more variants of Muon to control the properties of model updates if we can get both individually, hence the blog's proposal to revisit some fundamental linear algebra techniques for the computation. The methodological takeaway from the blog's thought process is that there are three components to breaking down a ML problem: (1) how to be able to compute something (power iteration), (2) how to compute it fast (cholesky decomposition), and (3) how to compute it accurately given finite floating points (repeated orthogonalization). The goal of reading inspiring blogs like this is, in Feynman's term, to be able to "reinvent" them at any time to grasp the fundamental approach of doing similar work. Original blog: kexue.fm/archives/11654
Muyu He tweet mediaMuyu He tweet mediaMuyu He tweet mediaMuyu He tweet media
English
10
142
1.7K
76.7K