Weikai Huang

10 posts

Weikai Huang

Weikai Huang

@BrightChihiro

Master's student at Shanghai Jiao Tong University, focusing on LLM reasoning.

Katılım Ekim 2021
183 Takip Edilen2 Takipçiler
Weikai Huang retweetledi
Allen
Allen@mumaren_2·
这条推文的含金量还在上升 Follow Builders, Not Influencers~ @karpathy — 前 OpenAI/Tesla AI,现 Eureka Labs,AI 教育传奇 @swyx — AI Engineer 运动发起人,Latent Space 播客主理人 @joshwoodward — Google Labs VP,负责 Gemini App 和 AI Studio @kevinweil — 前 OpenAI CPO,前 Instagram/Twitter 产品负责人 @petergyang — Roblox 产品领导,Behind the Craft 作者 @thenanyu — Linear Head of Product,一线 AI 产品构建者 @realmadhuguru — Google Gemini 产品领导,推动"快速构建"文化 @AmandaAskell — Anthropic 哲学家,塑造 Claude 的人格与品格 @_catwu — Anthropic Claude Code 产品负责人 @trq212 — Anthropic Claude Code 工程师,深度分享 AI Agent 实践 @GoogleLabs — Google 官方 AI 实验账号 @amasad — Replit CEO,AI 编程工具推动者 @rauchg — Vercel CEO,Next.js 作者 @alexalbert__ — Anthropic Claude 团队 PM @levie — Box CEO,企业级 AI 与商业趋势洞察 @ryolu_ — Cursor Head of Design,前 Notion/Stripe @garrytan — Y Combinator CEO,AI 创业生态 @mattturck — AI 投资人,MAD Podcast 主理人 @zarazhangrui — follow-builders 项目作者,AI Builder & 策展人 @nikunj — FPV Ventures 合伙人,AI 时代 SaaS 思考 @steipete — iOS/macOS 开发传奇,现聚焦 AI 开发工具 @danshipper — Every 创始人,探讨 AI 对工作与创造的影响 @adityaag — South Park Commons GP,前 Dropbox CTO @sama — OpenAI CEO— Anthropic 官方 Claude 账号 @mumaren_2 —木马人,多年大厂经验,专注AI领域知识和工具分享
Allen tweet media
Allen@mumaren_2

是不是烦透了时间线上的垃圾内容? 然后刷了半天刷不到有用信息? 今天分享一个X的小技巧 只要3步帮你轻松解决信息源的问题! 1.选择你喜欢的博主,点击右上角,从列表中添加 2.选择添加列表,自定义列表名称,比如我这里有个AI信息源,就是所以我觉得好的AI博主的列表,看到就更新 3.配置时间线,在设置—时间线—主页标签中,可以自定义列表位置和内容 这样设置完,如果看到感兴趣就添加到列表中,每天刷刷列表就行了 如果觉得有用的话,点个赞吧!

中文
49
153
607
83.2K
Weikai Huang
Weikai Huang@BrightChihiro·
@sirbayes Super clever idea to make smaller LLMs reliable in TextArena! Quick question: do you plan to open-source the code? Would love to try it out and build on it. Thanks!
English
0
0
0
215
Kevin Patrick Murphy
Kevin Patrick Murphy@sirbayes·
I am pleased to share our 'AutoHarness' paper (ICLR'26 ws), that uses LLM-based code synthesis to generate python harness around an LLM policy. AutoHarness+small Gemini Flash beats Gemini-2.5-Pro and GPT-5.2-High on #TextArena games! openreview.net/forum?id=g9rEY…
English
2
14
119
12.5K
SVC_0
SVC_0@xjhylmy·
@Jiaxi_Cui 甚至在极少人知道的字节的一个角落,还有infra、research、product之外的第四块拼图😆
中文
8
0
3
5.7K
Panda
Panda@Jiaxi_Cui·
继续整理一下这一轮的历史,说起来感觉23 24年好像是非常久远的过去,也很神奇了 23年的时候,字节的AI可以说就是一坨,那时行业里还觉得大厂有一惯的反应慢的问题。 但后来张一鸣是亲自下场了,不是象征性关注一下,而是高强度地见了大量researcher 和创业者。他会沿着近两年发过 LLM 相关论文、背景又不错的 researcher,一个个聊过去。与此同时,他也会去接触一些刚融过一轮钱、但方向和团队都很有潜力的 founder。那会如果你背景还不错,活跃在即刻,几乎都会被约 如果你这个人真的有价值,字节给你的方案不只是高薪,而是会把整件事一起解决。一边用非常强的现金、期权和汇报层级解决你的个人激励问题;另一边处理你原来公司的股权、团队、业务和投资人关系。然后拉进字节给千万的年薪并且直接向他汇报,一方面解决你的经济问题,一方面可以和最优秀的人一起工作,载入历史浪潮,几乎没有哪个优秀的人能抵挡这样的诱惑,因为和聪明人一起做有意义的事是非常爽的 随后字节迅速形成了一个很清晰的格局:火山偏 infra,Seed 偏 research,豆包偏产品。这套分工不是嘴上说说,而是外界后来能明显感觉到,它真的把基础设施、模型能力和产品落地串起来了。到24年底人们慢慢发现字节和从前的大厂不一样,它行动迅速,丝毫没有臃肿和迂腐的组织架构,流行的说法是“这头大象真的能跳舞” 另一边,23 年同期的 Qwen 逐渐击败 Baichuan、GLM 等模型,确立了自己的领导位置。半年之后,更多市场和普通投资者开始意识到这件事,阿里的股票也因为 Qwen 的领先叙事、AI 预期而一路走强。 但它并没有像字节那样,向外界呈现出一个特别清晰、特别强势、权责特别分明的 infra - research - product 协同格局。 只能说有点可惜
中文
32
69
619
90.9K
Weikai Huang retweetledi
Shu Lynn Liu
Shu Lynn Liu@shulynnliu·
Great question! Two main lenses: 1. Systems: most prior frameworks share the similar conceptual loop but are tightly coupled, so changing one component often requires rewriting the system. 2. Algorithmic: many existing methods rely on fixed strategies/parameters even though the search dynamics change over time. That motivated a modular framework + adaptive algorithms that can adjust (AdaEvolve), or even evolve the evoler (EvoX) itself!
English
0
1
5
207
victor-wu.eth
victor-wu.eth@victor_wu·
在 Twitter 上看到一个案例,一位律师写了一个处理法律文件的 Skill。 他的做法很有意思:他并没有一开始就和 Agent 讨论要定义什么样的 SOP 或规则,而是先和 Agent 进行正常的对话交流。比如直接向 Agent 提出需求,然后根据自己的专业知识进行修改。 这一点看似很常见,但下一步就很有意思了。在经过了半个月到一个月的时间后,他让 AI 自动分析这段时间以来他和 AI 之间的所有对话,由 AI 整理出一套 SOP,最后再生成 Skill。 我最近也在写产品文档。我是个“三流产品经理”,在最终交付产品文档的过程中,我和 AI 进行了大量的沟通与删减: 1. 有些内容是我一开始缺失的,然后我后来想到的 2. 有些内容是我觉得没必要的 所以我今天也干了一件事:我直接让 Claude 给我也自动分析一下这半个月以来,我跟 AI 基于产品文档进行的这些交流和经验。我打算通过这种方式自己整理经验,然后生成一个 Skill。 我觉得以后大家沉淀 Skill 或者是做 Skill,第一步也可以这样: 1. 并不需要着急去定义一个 Skill 2. 可以先用最原始的办法和 AI 进行大量沟通 3. 让 AI 自己去发现,这些 Skill 应该怎么写才是最合适的 毕竟我们以前的 Skill 往往是基于“非 AI 时代”生成的 SOP。那么有了 AI 之后,该如何让 AI 更好地参与到整个 Skill 的环节中?我觉得让 AI 自己回来总结就可以了。
victor-wu.eth tweet media
中文
7
23
193
25.5K
Weikai Huang
Weikai Huang@BrightChihiro·
@shulynnliu Exciting works! What lens did you use to identify the shortcomings of the previous work?
English
1
0
1
328
Shu Lynn Liu
Shu Lynn Liu@shulynnliu·
AlphaEvolve is closed-source. We release 🌟SkyDiscover🌟, a flexible, modular open-source framework with two new adaptive algorithms that match or exceed AlphaEvolve on many benchmarks and outperform OpenEvolve, GEPA, and ShinkaEvolve across 200+ optimization tasks. Our new algorithms dynamically adapt their search strategy, and can even let the AI optimize its own optimization process on the fly! Results: 📊 +34% median score improvement on 172 Frontier-CS problems. 🧮 Matches/exceeds AlphaEvolve on many math benchmarks ⚙️ Discovers system optimizations beyond human-designed SOTA 🧵👇
GIF
English
12
105
583
142.3K
Weikai Huang
Weikai Huang@BrightChihiro·
@emrecanacikgoz Do you think self-play will be widely adopted by industry and academia to improve model capabilities? 👀
English
0
0
0
134
Emre Can Acikgoz
Emre Can Acikgoz@emrecanacikgoz·
Can LLMs self-evolve into general-purpose tool-calling agents without any external data? Yes. Introducing "Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data"! ♾️ Initialized from the same LLM, Tool-R0 co-evolves a Generator and Solver through self-play RL; training general-purpose tool-calling agents entirely from scratch with zero data. (1/n)
Emre Can Acikgoz tweet media
English
11
49
288
26.4K
Weikai Huang retweetledi
Andrew Zhanke Zhou
Andrew Zhanke Zhou@zhankezhou·
Tired of debugging LLMs by reading the extremely long chain of thoughts? We built Landscape of Thoughts (LoT) to transform complex thoughts into intuitive visual maps to help you understand model behaviors. Paper and findings in 🧵 1/10 youtu.be/Zb8CfYxSvik?si… via @YouTube
YouTube video
YouTube
English
2
8
24
5.7K
Weikai Huang retweetledi
Jason Wei
Jason Wei@_jasonwei·
There are traditionally two types of research: problem-driven research and method-driven research. As we’ve seen with large language models and now AlphaEvolve, it should be very clear now that total method-driven research is a huge opportunity. Problem-driven research is nice because you have a consistent and specific goal. The goal is usually virtuous, so it feels good to have a mission and identity. However, it just doesn’t work due to The Bitter Lesson. Basically everything in classical NLP (machine translation, summarization, chatbots) lost to simple scaling. ChatGPT is a prime example—it used nothing from chatbot research and certainly wasn’t the intended end goal of OpenAI’s 2022 research program, but was a huge hit because someone (John Schulman et al) figured out the right way to package large language models as a product. Method-driven research feels less stable because you’re constantly searching for problems and you have to be opportunistic. But I believe AI will allow method-driven research to dominate progress in most fields of science, one-by-one. The latest method (or “hammer”), as we’ve seen in AlphaEvolve, is ruthless search and optimization against a reward function (whether this requires RL or not is a separate discussion). Things that problem-driven researchers have been trying to solve for a long time like the kissing number problem will become nails hit by the hammer. Eventually the hammer will become bigger, stronger, and more general and will hit more and more nails. So a very important meta-skill for the next decade will be knowing how to create the right environments to use The Hammer. Ironically, the problem-driven researchers, who by definition are experts in a specific problem, are well-positioned to create these environments. If, that is, they can put down their egos and pick up the hammer.
English
22
92
712
78.5K
Weikai Huang retweetledi
Dongfu Jiang
Dongfu Jiang@DongfuJiang·
Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and ToolRL. While these achieve impressive performance, their training codes are either not fully open-sourced or too difficult to modify and customize with new tools, creating unexpectedly high engineering costs for the community when exploring new ideas. To address these issues and reduce engineering overhead, we propose verl-tool. Key Features: 1. 🔧 Complete decoupling of actor rollout and environment interaction - We use verl as a submodule to benefit from ongoing verl repo updates. All tool calling is integrated via a unified API, allowing you to easily add new tools by simply adding a Python file and testing independently. 2. 🌍 Tool-as-environment paradigm - Each tool interaction can modify the environment state. We store and reload environment states for each trajectory. For each training, you can launch 3. ⚡ Native RL framework for tool-calling agents - verl-tool natively supports multi-turn interactive loops between agents and their tool environments. 4. 📊 User-friendly evaluation suite - Launch your trained model with OpenAI API alongside the tool server. Simply send questions and get final outputs with all interactions handled internally. We've successfully reproduced ToRL results using our verl-tool framework, demonstrating its correctness and demonstrating comparable performance on mathematical benchmarks. VerlTool is an active ongoing project! We aim to incorporate more tools covering a wide range of use cases and expect they can be trained together in a single framework. Suggestions and contributions are highly welcomed! Check out our GitHub: github.com/TIGER-AI-Lab/v… More details: 👇 (0/4)
Dongfu Jiang tweet media
English
6
73
383
79.8K