Jianbo Wu

87 posts

Jianbo Wu banner
Jianbo Wu

Jianbo Wu

@jwu323

training & inference / opinions are mine

San Francisco Katılım Ekim 2023
1.9K Takip Edilen131 Takipçiler
Jianbo Wu
Jianbo Wu@jwu323·
FedEx appears to have lost my expensive package. It was one of three packages in the shipment; the other two were delivered, but this one has had no tracking updates for two weeks. Today, FedEx told me they have exhausted their search and still cannot locate it. What can I do at this point? @FedEx @FedExHelp
English
1
0
2
249
Jianbo Wu retweetledi
Z.ai
Z.ai@Zai_org·
Intelligence should be open, accessible, and ready to build with, empowering every developer, everywhere. GLM-5.2 is now available to all GLM Coding Plan users, including Lite, Pro, Max, and Team plans. docs.z.ai/devpack/latest… As our new flagship model, GLM-5.2 delivers powerful coding capabilities, usable 1M-context support, and continued strengths in long-horizon tasks. API and Chatbot services will launch next week. The model will also be officially open-sourced next week under the MIT License. The future of AI is open, and it belongs to the people.
English
360
993
8.3K
2.5M
Xiangyi Li
Xiangyi Li@xdotli·
2023 prompt engineering 2024 context engineering 2025 harness engineering 2026 loop engineering what's next?
English
9
2
16
2K
Amogh Mishra
Amogh Mishra@MishraAmogh·
Are there any sub‑1B coding agents that are surprisingly powerful for RLM use cases? @a1zhang , you may know? Thanks
English
1
0
0
76
jason
jason@jxnlco·
I need Google Docs but just for markdown files. Multiplayer comments. Syncing resolving comments. Suggestion mode Edit mode Edit history Maybe some sense of multi edits. Easy cli access.
English
288
26
1.8K
494.7K
Jintao Zhang 张晋涛
Jintao Zhang 张晋涛@aiandcloud·
前几天和朋友聊到在 coding agent 中进行上下文压缩的策略。 我们可以把动态规划和经济学模型结合起来,不再简单按阈值触发,配合经济学模型进行精算,只有在收益比为正时再进行压缩。 同时由于压缩后会有失真,再引入失真惩罚,缓存失效以及压缩成本等,最大化缓存利用率,尤其是用 DeepSeek API 时
Jintao Zhang 张晋涛 tweet media
中文
36
17
173
15.6K
FENG DONG
FENG DONG@middlefeng·
一直很惊讶为什么 agent 能很快洞悉 code base 一些全局的联系,因为它的上下文显然也读不了太多代码。今天突然想明白了。比如说用一个很 general 的 expression 去 grep 整个 code base,结果可能有几千行。这个结果对人来说就等于没用。但是对 agent 来说就相当于去读整个 code base 的指南。
中文
7
1
47
31.8K
Julien Chaumond
Julien Chaumond@julien_c·
Today I'm launching a new project called SynthTraces 🔥 It is a minimal codebase to generate synthetic coding agent session traces using Pi (from @badlogicgames) I wanted a large number of coding-agent traces, so I built a tiny harness where two models talk to each other: - an open model (served via HF Inference Providers) plays the coding agent. It gets read + bash access to a real open source codebase (the huggingface OSS projects) - a small local model (llama.cpp) plays the human user, asking simple questions like "how do I run this?" or "how is CI set up?" The result is more than 2,000 Pi session traces which can be used to train or fine-tune LLMs, and optimize them for Pi 🤯 And ofc everything is published on @huggingface
Julien Chaumond tweet media
English
38
52
355
52.9K
Han Xiao
Han Xiao@hxiao·
Sharing a project I've been heavily using - Dataroom. It's a local-first harness that runs deep research with a small language model and gives a zip file at the end. Deep research is becoming an important first step for long-horizon tasks (the 2nd step being implementation), and I believe a small local model in a disciplined harness handles it well - we shouldn't waste frontier-model tokens on it. Dataroom runs on your own GPU at near-zero marginal cost, and it can keep going for hours until the dataroom is genuinely comprehensive, instead of stopping when a metered budget runs out.
Han Xiao tweet media
English
8
17
189
15K
Jianbo Wu
Jianbo Wu@jwu323·
Maybe dropping an LLM with trace analysis into the loop could help steer things a bit more intentionally — not just tweaking the model arch and checking the loss, but actually asking why some changes work better than others. source: blog.huikang.dev/2026/05/31/aut…
Jianbo Wu tweet media
English
1
1
2
184
Jianbo Wu retweetledi
Kyle Lo
Kyle Lo@kylelostat·
happy to share another quality tech report w/ the wider research community 🫶 great read for ppl who want to see all the details for methods + infra for scaling up pretraining & RL, esp detailed discussion about data which is often kept vague by other labs
Kyle Lo tweet media
Mustafa Suleyman@mustafasuleyman

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end. Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing. Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost. All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat. Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost. Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare. Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: microsoft.ai/news/building-…

English
13
24
388
26.6K
Criska
Criska@yiliuai·
被自己做的泡菜炒饭好吃哭了,就是泡菜太多了有点像饭炒泡菜
Criska tweet media
中文
9
0
19
1.6K