Leo

455 posts

Leo

@YangLi_leo

AI Agents

Sakyo-ku, Kyoto Katılım Kasım 2023

594 Takip Edilen198 Takipçiler

Leo@YangLi_leo·8 May

如果没有claude，Microsoft到今天都不会有一丝一毫的想法去优化office全家桶的任何体验的，同时copilot一度曾朝着熟悉的杀毒流氓软件的模样发展😅

Claude@claudeai

Claude for Excel, PowerPoint, and Word are now generally available, and Claude for Outlook is in public beta. As Claude moves between your Microsoft apps, it carries the full context of your conversation.

中文

Leo retweetledi

OpenAI@OpenAI·7 May

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

English

688

1.4K

14.8K

3.5M

Leo@YangLi_leo·7 May

@bridgebench i mean the dataset, can you at least release some of the sample?

English

Bridgebench@bridgebench·7 May

@YangLi_leo Check bridgebench.ai!

English

425

Bridgebench@bridgebench·7 May

Claude Opus 4.7 is the #1 refactoring model on BridgeBench. GPT 5.5 is nowhere on the leaderboard. GPT 5.5 is the most intelligent model on the market. But when it comes to refactoring existing code, Claude Opus 4.7 is untouchable. Every model has a strength. Know when to use each one. bridgebench.ai

English

122

13.1K

Leo retweetledi

Claude@claudeai·6 May

We’ve agreed to a partnership with @SpaceX that will substantially increase our compute capacity. This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API.

English

4.8K

12.1K

131K

23.7M

Leo@YangLi_leo·6 May

最近发现日区推文浏览量极高，点赞量也很可以但是评论量非常的低结合之前@nikitabier 的说法，我觉得可能真的说明日区的spam tweet是top tier的🧐

中文

Leo@YangLi_leo·3 May

codex, yes!!!

English

Leo retweetledi

Patrick Collison@patrickc·29 Nis

We just launched the @Link CLI: github.com/stripe/link-cli. Tell your friendly neighborhood agent about it -- agents can use the Link CLI to create single-use credentials that you get to synchronously approve each time. I asked Claude to buy itself a gift. It chose HTTPZine on Gumroad.

English

133

170

2.4K

382K

Leo@YangLi_leo·29 Nis

我以前是Agent in Sandbox的绝对拥护者，但从三月份之后，就开始转向将Agent Loop和执行环境隔离的操作了。以我的看法和Anthropic是极度统一的: 把 Agent 放到Sandbox里会造成后面的调试极其困难，以及确实在安全性上，还要增加gateway这种比较复杂的设置，我个人不太喜欢的

Cursor@cursor_ai

With the Cursor SDK, you can run agents locally or deploy them in our cloud.

中文

Leo@YangLi_leo·26 Nis

@wey_gu 日本这边氛围和国内真的比不了🫠

中文

Wey Gu 古思为@wey_gu·26 Nis

在霓虹国的哥哥们组织一个呀，我也想去参加

TJ (thaddeus jiang)@thaddeusjiangzh

中国的技术活动的场地都好棒啊，羡慕，我也想去玩。大部分日本的技术活动的场地连大学教室都不如，活动形式也都和像幼儿园小朋友上课差不多，偶尔去一次就好几年不想去了。这是围城吗？😂

中文

2.1K

Leo@YangLi_leo·24 Nis

@satyanadella can we expect GUI for copilot? I mean not vscode for sure🫡

English

1.3K

Satya Nadella@satyanadella·24 Nis

Super excited GPT-5.5 is rolling out to GitHub Copilot, M365 Copilot, Copilot Studio, and Foundry today. With deeper reasoning, stronger multistep execution, and better performance across long, complex tasks, GPT-5.5 helps you go from idea to execution faster with fewer iterations to get to the right outcome. It’s all about helping you choose the right model, or models, for the right task across your workflow.

English

259

414

4.5K

465.7K

Leo retweetledi

DeepSeek@deepseek_ai·24 Nis

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

1.6K

7.7K

45.3K

9.7M

Leo retweetledi

OpenAI@OpenAI·23 Nis

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

English

2.5K

51.8K

13M

Leo@YangLi_leo·23 Nis

@ShunyuYao12 CL-Bench is quite a useful work to eval the long-horizon task, hope see more further study about that!

English

2.8K

Shunyu Yao@ShunyuYao12·23 Nis

Our goal is to build practical models with comprehensive capabilities beyond open benchmarks. And the only way to do it to co-design with diverse products while scaling solidly. Tencent has the best product ecosystem and a solid, low-ego culture, and we are just getting started!

Tencent Hy@TencentHunyuan

👋Hi /haɪ/, we're the Tencent Hy /haɪ/ team🐧 Today, we open source Hy3 preview (295B A21B), a leading reasoning and agent model in its size, with great cost efficiency. Give us feedback to help improve Hy3 official! 🤗 hf.co/tencent/Hy3-pr… 📖 hy.tencent.com/hy3-preview

English

152

1.9K

868.6K

Leo@YangLi_leo·23 Nis

最近一直在做vault相关的工作，将执行环境与任何key彻底解耦，于是真的是从mcp出来这么长时间以来终于第一次发现mcp是个好东西

ClaudeDevs@ClaudeDevs

New blog: Building agents that reach production systems with MCP. When should agents use direct APIs vs CLIs vs MCP? Plus patterns for building MCP servers, context-efficient clients and pairing MCP with skills. claude.com/blog/building-…

中文

Leo@YangLi_leo·22 Nis

🤗

Herrington Darkholme@hd_nvim

她突然凑近问我：“你也是搞 AI 的呀？做哪块方向的？”我背后一凉，嘴角抽搐：“就…… 就是大家都做的那种……” 她眼睛瞬间亮了：“你是做基座大模型的吧？万卡集群、千亿参数多模态对齐那种？”我干笑：“不是……” 她更兴奋：“那肯定是搞底层推理优化的咯？算力调度、量化权重、KV Cache，简直是 AI 工业化的核心啊！”我垂着头：“也不是……” 她盯着我，语气严肃：“难道是搞前沿架构的吧？多智能体博弈、思维链强化学习，做的就是通用人工智能 AGI？”我小声说：“也不是……” 她明显懵了：“那你不会是搞向量基建的吧？混合检索算法、GraphRAG 深度索引、多维向量数据库优化？”我摆摆手：“没有造轮子…… 也没搞过基建……” 她陷入沉思，缓缓开口：“那你做什么？” 我顿了顿，红着眼圈，终于崩溃，带着哭腔喊出：“我…… 我就是在搞 Agent 开发的！！！”

QME

Leo retweetledi

Kimi.ai@Kimi_Moonshot·20 Nis

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

929

2.4K

18.2K

7.5M

Leo@YangLi_leo·20 Nis

ZXX

Leo retweetledi

Guillermo Rauch@rauchg·20 Nis

Here's my update to the broader community about the ongoing incident investigation. I want to give you the rundown of the situation directly. A Vercel employee got compromised via the breach of an AI platform customer called Context.ai that he was using. The details are being fully investigated. Through a series of maneuvers that escalated from our colleague’s compromised Vercel Google Workspace account, the attacker got further access to Vercel environments. Vercel stores all customer environment variables fully encrypted at rest. We have numerous defense-in-depth mechanisms to protect core systems and customer data. We do have a capability however to designate environment variables as “non-sensitive”. Unfortunately, the attacker got further access through their enumeration. We believe the attacking group to be highly sophisticated and, I strongly suspect, significantly accelerated by AI. They moved with surprising velocity and in-depth understanding of Vercel. At the moment, we believe the number of customers with security impact to be quite limited. We’ve reached out with utmost priority to the ones we have concerns about. All of our focus right now is on investigation, communication to customers, enhancement of security measures, and sanitization of our environments. We’ve deployed extensive protection measures and monitoring. We’ve analyzed our supply chain, ensuring Next.js, Turbopack, and our many open source projects remain safe for our community. The recommendation for all Vercel customers is to follow the Security Bulletin closely (vercel.com/kb/bulletin/ve…). My advice to everyone is to follow the best practices of security response: secret rotation, monitoring access to your Vercel environments and linked services, and ensuring the proper use of the sensitive env variables feature. In response to this, and to aid in the improvement of all of our customers’ security postures, we’ve already rolled out new capabilities in the dashboard, including an overview page of environment variables, and a better user interface for sensitive env var creation and management. As always, I’m totally open to your feedback. We’re working with elite cybersecurity firms, industry peers, and law enforcement. We’ve reached out to Context to assist in understanding the full scale of the incident, in an effort to protect other organizations and the broader internet. I also want to thank the Google Mandiant team for their active engagement and assistance. It’s my mission to turn this attack into the most formidable security response imaginable. It’s always been a top priority for me. Vercel employs some of the most dedicated security researchers and security-minded engineers in the world. I commit to keeping you updated and rolling out extensive improvements and defenses so you, our customers and community, can have the peace of mind that Vercel always has your back.

English

447

7.2K

2.6M

Leo retweetledi

Claude@claudeai·17 Nis

Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.

English

4.1K

15.1K

148.6K

63.6M

Keşfet

@bridgebench @SpaceX @nikitabier @Link @wey_gu @satyanadella @ShunyuYao12 @elonmusk