Lei Zhang (Harry)

2.3K posts

Lei Zhang (Harry) banner
Lei Zhang (Harry)

Lei Zhang (Harry)

@resouer

ML infra @NVIDIA / Prev: Alibaba and Microsoft

San Francisco, CA Katılım Eylül 2012
891 Takip Edilen1.9K Takipçiler
Jeff Hollan
Jeff Hollan@jeffhollan·
Excited to announce the new preview for Microsoft Foundry Agents 🎉! You can now build, run, and deploy your agent using any model, any framework, any harness in the cloud 🧑‍💻 - check out the demo below This is not just any cloud compute environment; it's an agent-optimized platform with: 🖥️ Persistent microVMs - securely scale up and down without losing context 🛠️ Built-in tools (1000+) 👀 Observability and evaluations 👷 Guardrails 🔐 Private networking... and more
English
30
96
620
138.9K
Lei Zhang (Harry)
Lei Zhang (Harry)@resouer·
@yadong_xie @tualatrix 如果以后开发部署流程都在手机上,这种一键 vibe 到上线直接赚钱的体验还是很有用的。它这种应该会 focus 在比较轻的应用上,类似 lovable / nocode
中文
0
0
0
17
Yadong Xie
Yadong Xie@yadong_xie·
@resouer @tualatrix 不过我倾向这个东西只能拿来做玩具,能抹平沟通成本,大家用这玩意做出来 poc 验证需求之后,还是会 deploy 到别的地方去
中文
1
0
0
35
图拉鼎
图拉鼎@tualatrix·
抢先体验 Claude Design……它是真的打算把所有的软件生产相关的生产力工具都做了吗?
图拉鼎 tweet media图拉鼎 tweet media图拉鼎 tweet media
中文
22
7
170
40.2K
Yadong Xie
Yadong Xie@yadong_xie·
@tualatrix 产品经理,设计师,前后端一起打包带走
中文
1
0
7
2.5K
Matt Stockton
Matt Stockton@mstockton·
This post is so so good. We are at this interesting point where we’ve started to really figure out ‘memory’ with these LLM-based systems. And @hwchase17 is totally right - memory is basically just context collected and injected at the right times - it’s probably the most important context, and it *must* be interpretable to you, and portable for you. There is a real danger in adopting systems that won’t allow this for you, or actively prevent you from doing it. I hadn’t put it all together before reading this post, but this is definitely one of the reasons I love building on top of DeepAgents
Harrison Chase@hwchase17

x.com/i/article/2042…

English
8
16
204
87.9K
Lei Zhang (Harry)
Lei Zhang (Harry)@resouer·
@CoooolXyh 相比于正规云厂商,CF 的 CDN 服务的价格几乎等于白送,真正的利润来自于托管的同时给企业客户提供安全 扫描和加固。北美很多 infra 小公司商业模式都是这样,主产品直接白送用来走量,然后靠安全/隐私/合规这几个能贩卖焦虑的硬通货挣钱。现在市场担心它这个 value 要被客户自己拿 AI tool 吃掉了。
中文
0
0
1
204
Lei Zhang (Harry)
Lei Zhang (Harry)@resouer·
@9hills 嗯,是这样。我们之前在云上大规模托管 agents,从来不需要 agent in sandbox 这种邪修,一个 agent 就是一个 hypervisor container 实例,agent 在里面随便折腾,给最高权限,session 都是存在云数据库的,需要执行第三方代码的时候直接从 pool 里面拉一个远程 code interpreter 容器来执行。
中文
0
0
0
53
九原客
九原客@9hills·
Anthropic 的 Managed Agents 可能会被低估,其主要解决了大规模自主 Agent 工程化托管的问题。 举个 LangChain 也说过的小点,Agent 要运行 Bash,到底是 Agent in Sandbox,还是Sandbox as Tool? Claude Code 是前者,而真正的 Agent Infra 一定是后者。
中文
14
5
74
12.3K
Lei Zhang (Harry)
Lei Zhang (Harry)@resouer·
Training a model that's good at both math and code sounds like a data mixing problem. It's not. RL gradient updates are far more aggressive than SFT, optimizing for code actively degrades math. The domains fight each other. Kudus Nemotron-Cascade 2 finds a different bet!
Wei Ping@_weiping

🚀 Introducing Nemotron-Cascade 2 🚀 Just 3 months after Nemotron-Cascade 1, we’re releasing Nemotron-Cascade 2: an open 30B MoE with 3B active parameters, delivering best-in-class reasoning and strong agentic capabilities. 🥇 Gold Medal-level performance on IMO 2025, IOI 2025, and ICPC World Finals 2025: • Capabilities once thought achievable only by frontier proprietary models (e.g. Gemini Deep Think) or frontier-scale open models (i.e. DeepSeek-V3.2-Speciale-671B-A37B). • Remarkably high intelligence density with 20× fewer parameters. 🏆 Best-in-class across math, code reasoning, alignment, and instruction following: • Outperforms the latest Qwen3.5-35B-A3B (2026-02-24) and even larger Qwen3.5-122B-A10B (2026-03-11). 🧠 Powered by Cascade RL + multi-domain on-policy distillation: • Significantly expand Cascade RL across a much broader range of reasoning and agentic domains than Nemotron-Cascade 1, while distilling from the strongest intermediate teacher models throughout training to recover regressions and sustain gains. 🤗 Model + SFT + RL data: 👉 huggingface.co/collections/nv… 📄 Technical report: 👉 research.nvidia.com/labs/nemotron/…

English
0
0
1
388
Lei Zhang (Harry)
Lei Zhang (Harry)@resouer·
Just read SWE-CI arxiv.org/abs/2603.03823, finally a benchmark measuring AI coding maintainability over time. Score weights later iterations more heavily so technical debt shows up. Though: what if Architect generates weak requirements tank the Programmer's score? Ablation Study?
English
0
0
4
249
Yuhang
Yuhang@CoooolXyh·
Typeless 实在太牛逼了 我天天看大家说,结果直到今天才下载。以前一直以为就是一个普通的语音输入法,能有多牛逼?今天一体验,就知道为什么大家都在夸它了 这段话也都是Typeless帮我输入的
中文
62
15
240
108.7K
Darren Shepherd
Darren Shepherd@ibuildthecloud·
So this is what I've been working on for a while. Otto8 is just an example of Obot Platform which has all the real code for this. Obot is the thing that gets sold to enterprise, but it's also open source too. Otto8 is just a bunch of configurations on top of Obot. Oh, and the user UI I built. Yep, I'm full stack now.
Darren Shepherd@ibuildthecloud

Announcing Otto8: ChatGPT for DevOps (link in thread) It's like ChatGPT, but has automated tasks, easy integration with devops CLI tools, integrated shell, and many other features. And it's fully open source.

English
7
4
59
8.1K
Lei Zhang (Harry)
Lei Zhang (Harry)@resouer·
@the_sttts @tsaha Hence I believe in practice we simply yield the control of those knobs to the platform. Git or any config source should not be the replacement of your automatic system.
English
0
0
0
43
Stefan Schimanski
Stefan Schimanski@the_sttts·
@tsaha A promise of server-side-apply that in practice unfortunately is not fulfilled because there is only fail-on-conflict or force-overwrite, not ignore-that-one-conflict because I actually don‘t care enough.
English
1
0
0
56
Tamal Saha
Tamal Saha@tsaha·
How does GitOps work with "intelligent" #Kubernetes operators? By "intelligent" I mean operators that can modify a resource automatically to perform operations like scaling, auth / tls rotation etc.
English
2
0
0
139
Lei Zhang (Harry)
Lei Zhang (Harry)@resouer·
@dims Just watched, this is a great learning! So it seems they used EKS and Karpenter directly, right? Not with AWS HyperPod.
English
0
0
1
55
Chris Gillum
Chris Gillum@cgillum·
I'm playing with AutoGen and think it's really cool, but it also feels like a low-code tool to me. As a professional developer, I feel like I'd rather write loops and termination conditions myself in code rather than using declarative constructs, like termination message filters or human-in-the-loop config.
English
3
0
9
1.3K
Lei Zhang (Harry)
Lei Zhang (Harry)@resouer·
@clare_liguori This is awesome Clare! Is the state machine generated by Step Functions or manually created?
English
1
0
0
758
Clare Liguori
Clare Liguori@clare_liguori·
I just published a complete guide to serverless Gen AI prompt chaining with Bedrock and Step Functions 🚀 Write prompt templates, chains, loops, if-thens, model response validations, and call other AWS services, all with the Amazon States Language github.com/aws-samples/am…
English
4
51
180
26.7K
Lei Zhang (Harry)
Lei Zhang (Harry)@resouer·
@sunbains @TreybigDavis @reneeshah123 @cra I'd argue for non-hype-cloud db vendors, if possible, they'd love to pivot to similar direction too. If this is valid, I expect cloud will adapt to the needs from industry eventually, just a matter of time.
English
1
0
2
184
Sunny Bains @TiDB
Sunny Bains @TiDB@sunbains·
Not sure the industry is heading in this direction. Unless the cloud providers start providing direct access to RDMA. Alibaba (PolarDB/PolarFS) can use RDMA because they own the entire stack and do as they please. AFAIK, it’s not that cheap either but it does wonders for latency.
English
1
0
1
89
Renee Shah
Renee Shah@reneeshah123·
This paper from 2022 was compelling. Right now, databases are separating storage and compute. In the future, databases may separate memory and compute: arxiv.org/pdf/2207.03027
English
7
14
123
19.4K
Lei Zhang (Harry)
Lei Zhang (Harry)@resouer·
@TreybigDavis @reneeshah123 Yes, it's an accurate definition of "cloud-native database" and IMO where the industry is heading to. Means, I am not sure of some recent debates on StarRocks: #issuecomment-2087643176" target="_blank" rel="nofollow noopener">github.com/cncf/sandbox/i… /cc @cra Instead, this trend will grow K8s adoption as the de facto deployment platform for db.
English
1
0
3
267
Davis Treybig
Davis Treybig@TreybigDavis·
Alibaba has published a lot of cool papers in this space too like users.cs.utah.edu/~lifeifei/pape… and vldb.org/pvldb/vol16/p4…. Their PolarDB product disaggregates memory. I have asked some people why this is not more commonplace in cloud data systems and what I have heard is that it is primarily enabled by RDMA blogs.nvidia.com/blog/what-is-r…, but the problem is that RDMA is very low level and therefore only something the cloud vendors themselves can take advantage of since they run the datacenter too. Third party DB vendors can't really make use of it though. I think otherwise the latency overhead of near vs. far memory is too high - e.g. see slide 12 from this talk: mvdirona.com/jrh/talksandpa…
English
2
0
6
754
Lei Zhang (Harry)
Lei Zhang (Harry)@resouer·
@zongheng_yang @kelseyhightower Skypolit is veru promising as it's the first time we position the "driver for clouds" at the center. While this may create adoption barrier as replacing the "driver" for an existing system could be painful. Now sure how such brownfield looks like though?
English
1
0
0
42
Jeff Hollan
Jeff Hollan@jeffhollan·
Today ⁦@SnowflakeDB⁩ announced Snowpark Container Services. Run cloud native multi container apps easily right next to your most valuable data. Super excited about the newest member of the Snowpark family ❄️🎉 snowflake.com/blog/snowpark-…
English
1
2
16
2.4K
David Fowler
David Fowler@davidfowl·
This repository and guidance came out of frustration answering questions related to .NET and async github.com/davidfowl/AspN… If you haven’t seen it before, I hope you learn something new. #dotnet
English
30
141
668
76.3K