Leo

455 posts

Leo

Leo

@YangLi_leo

AI Agents

Sakyo-ku, Kyoto Katılım Kasım 2023
594 Takip Edilen198 Takipçiler
Leo retweetledi
OpenAI
OpenAI@OpenAI·
Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.
English
688
1.4K
14.8K
3.5M
Leo
Leo@YangLi_leo·
@bridgebench i mean the dataset, can you at least release some of the sample?
English
0
0
0
30
Bridgebench
Bridgebench@bridgebench·
Claude Opus 4.7 is the #1 refactoring model on BridgeBench. GPT 5.5 is nowhere on the leaderboard. GPT 5.5 is the most intelligent model on the market. But when it comes to refactoring existing code, Claude Opus 4.7 is untouchable. Every model has a strength. Know when to use each one. bridgebench.ai
Bridgebench tweet media
English
32
6
122
13.1K
Leo retweetledi
Claude
Claude@claudeai·
We’ve agreed to a partnership with @SpaceX that will substantially increase our compute capacity. This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API.
English
4.8K
12.1K
131K
23.7M
Leo
Leo@YangLi_leo·
最近发现日区推文浏览量极高,点赞量也很可以但是评论量非常的低 结合之前@nikitabier 的说法,我觉得可能真的说明日区的spam tweet是top tier的🧐
中文
0
0
0
31
Leo
Leo@YangLi_leo·
codex, yes!!!
Leo tweet media
English
0
0
0
32
Leo retweetledi
Patrick Collison
Patrick Collison@patrickc·
We just launched the @Link CLI: github.com/stripe/link-cli. Tell your friendly neighborhood agent about it -- agents can use the Link CLI to create single-use credentials that you get to synchronously approve each time. I asked Claude to buy itself a gift. It chose HTTPZine on Gumroad.
Patrick Collison tweet media
English
133
170
2.4K
382K
Leo
Leo@YangLi_leo·
我以前是Agent in Sandbox的绝对拥护者,但从三月份之后,就开始转向将Agent Loop和执行环境隔离的操作了。 以我的看法和Anthropic是极度统一的: 把 Agent 放到Sandbox里会造成后面的调试极其困难,以及确实在安全性上,还要增加gateway这种比较复杂的设置,我个人不太喜欢的
Cursor@cursor_ai

With the Cursor SDK, you can run agents locally or deploy them in our cloud.

中文
0
0
0
47
Leo
Leo@YangLi_leo·
@wey_gu 日本这边氛围和国内真的比不了🫠
中文
0
0
0
25
Leo
Leo@YangLi_leo·
@satyanadella can we expect GUI for copilot? I mean not vscode for sure🫡
English
0
0
0
1.3K
Satya Nadella
Satya Nadella@satyanadella·
Super excited GPT-5.5 is rolling out to GitHub Copilot, M365 Copilot, Copilot Studio, and Foundry today. With deeper reasoning, stronger multistep execution, and better performance across long, complex tasks, GPT-5.5 helps you go from idea to execution faster with fewer iterations to get to the right outcome. It’s all about helping you choose the right model, or models, for the right task across your workflow.
English
259
414
4.5K
465.7K
Leo retweetledi
DeepSeek
DeepSeek@deepseek_ai·
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n
DeepSeek tweet media
English
1.6K
7.7K
45.3K
9.7M
Leo retweetledi
OpenAI
OpenAI@OpenAI·
Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.
English
2.5K
7K
51.8K
13M
Leo
Leo@YangLi_leo·
@ShunyuYao12 CL-Bench is quite a useful work to eval the long-horizon task, hope see more further study about that!
English
0
0
1
2.8K
Shunyu Yao
Shunyu Yao@ShunyuYao12·
Our goal is to build practical models with comprehensive capabilities beyond open benchmarks. And the only way to do it to co-design with diverse products while scaling solidly. Tencent has the best product ecosystem and a solid, low-ego culture, and we are just getting started!
Tencent Hy@TencentHunyuan

👋Hi /haɪ/, we're the Tencent Hy /haɪ/ team🐧 Today, we open source Hy3 preview (295B A21B), a leading reasoning and agent model in its size, with great cost efficiency. Give us feedback to help improve Hy3 official! 🤗 hf.co/tencent/Hy3-pr… 📖 hy.tencent.com/hy3-preview

English
50
152
1.9K
868.6K
Leo retweetledi
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…
Kimi.ai tweet media
English
929
2.4K
18.2K
7.5M
Leo
Leo@YangLi_leo·
Leo tweet media
ZXX
0
0
0
39
Leo retweetledi
Guillermo Rauch
Guillermo Rauch@rauchg·
Here's my update to the broader community about the ongoing incident investigation. I want to give you the rundown of the situation directly. A Vercel employee got compromised via the breach of an AI platform customer called Context.ai that he was using. The details are being fully investigated. Through a series of maneuvers that escalated from our colleague’s compromised Vercel Google Workspace account, the attacker got further access to Vercel environments. Vercel stores all customer environment variables fully encrypted at rest. We have numerous defense-in-depth mechanisms to protect core systems and customer data. We do have a capability however to designate environment variables as “non-sensitive”. Unfortunately, the attacker got further access through their enumeration. We believe the attacking group to be highly sophisticated and, I strongly suspect, significantly accelerated by AI. They moved with surprising velocity and in-depth understanding of Vercel. At the moment, we believe the number of customers with security impact to be quite limited. We’ve reached out with utmost priority to the ones we have concerns about. All of our focus right now is on investigation, communication to customers, enhancement of security measures, and sanitization of our environments. We’ve deployed extensive protection measures and monitoring. We’ve analyzed our supply chain, ensuring Next.js, Turbopack, and our many open source projects remain safe for our community. The recommendation for all Vercel customers is to follow the Security Bulletin closely (vercel.com/kb/bulletin/ve…). My advice to everyone is to follow the best practices of security response: secret rotation, monitoring access to your Vercel environments and linked services, and ensuring the proper use of the sensitive env variables feature. In response to this, and to aid in the improvement of all of our customers’ security postures, we’ve already rolled out new capabilities in the dashboard, including an overview page of environment variables, and a better user interface for sensitive env var creation and management. As always, I’m totally open to your feedback. We’re working with elite cybersecurity firms, industry peers, and law enforcement. We’ve reached out to Context to assist in understanding the full scale of the incident, in an effort to protect other organizations and the broader internet. I also want to thank the Google Mandiant team for their active engagement and assistance. It’s my mission to turn this attack into the most formidable security response imaginable. It’s always been a top priority for me. Vercel employs some of the most dedicated security researchers and security-minded engineers in the world. I commit to keeping you updated and rolling out extensive improvements and defenses so you, our customers and community, can have the peace of mind that Vercel always has your back.
English
447
1K
7.2K
2.6M
Leo retweetledi
Claude
Claude@claudeai·
Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude. Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.
English
4.1K
15.1K
148.6K
63.6M