107 posts

[email protected]

@trydotworks

Building @Pocketmodelnet

เข้าร่วม Mart 2026

117 กำลังติดตาม3 ผู้ติดตาม

[email protected]@trydotworks·10h

@blocmates @grok how much of the work in microfish is original and how much is just built on existing capabilities of OASIS

English

blocmates.@blocmates·3d

x.com/i/article/2034…

ZXX

105

35.1K

[email protected]@trydotworks·11h

@chetaslua @MiniMax_AI looping cheap models recursively is the way to go

English

Chetaslua@chetaslua·16h

Holy Shit This is insane result 🤯 I used opus 4.6 as a reviewer/planner and 4 worker minimax M2.7 agent and this the result Voxel art of eiffel tower < loop set to 5 that means opus will tell @MiniMax_AI to make it better 5 times >

English

546

62.1K

[email protected]@trydotworks·13h

@xiongchun007 I once tried to subscribe to the Qwen 3 Coder Plus API on Alicloud but was so confused and left. Got a Kimi subscription instead.

English

程序员老熊@xiongchun007·23h

成功订阅 DeepSeek。说说我为什么打算订阅的是千问，而最终下单的是 deepseek。原因很简单，尼玛的阿里百炼控制台，乱七八糟一堆模型、乱七八糟的计费模式、乱七八糟的排版布局、乱七八糟的菜单设置、乱七八糟的页面、乱七八糟的图片，总之一切都是乱七八糟。所以，打开 DeepSeek 看了一下，擦。这就是我要的，简单清爽、一目了然。就是你了，拿下！

中文

152

364

92.3K

[email protected]@trydotworks·15h

@JJEnglert @tenex_labs @AnthropicAI Looks like AI theatre. You should have an implementation plan when using sub-agents, so both images are the same. The main agent need to integrate the work in the end anyway, so the only difference is minor.

English

JJ Englert@JJEnglert·22h

Our engineers at @tenex_labs are slamming this new Claude Code feature. Here's how it works: @AnthropicAI just shipped Agent Teams — and it's a big deal even if you're not writing code yourself. Here's the simple version: Instead of one AI working on your problem alone, you can now spin up a whole team of AI agents that work together. One acts as the lead. The others are teammates. They each focus on a different piece of the work, talk to each other directly, and coordinate through a shared task list. Think of it like hiring a project manager who breaks the work into pieces, assigns it to specialists, and makes sure nothing falls through the cracks. Except all of those people are Claude, and they spin up in seconds. Why this matters even if you're not a developer If you've ever used Claude Code to research a problem, write a report, or automate a workflow — you've been working with one brain at a time. That brain has a limit on how much it can hold in its head before things start slipping. Agent Teams removes that bottleneck. Each teammate gets its own full memory. One can be deep in your financials while another is reviewing your competitor landscape while a third is drafting recommendations. They don't confuse each other's work because they literally can't see each other's context. And the best part — they talk to each other. The lead doesn't have to relay everything. Teammates share findings, challenge each other's conclusions, and build on each other's work directly. "Wait, didn't Claude Code already have subagents?" Yes. And this is where most people get confused. Here's the difference: Subagents are like sending an intern to go research something and come back with an answer. They do the work, hand you a summary, and they're done. They never talk to each other. You manage everything. Simple, cheap, good for focused tasks where you just need a result. Agent Teams are like assembling a working group. The teammates coordinate with each other, not just with you. They claim tasks from a shared list, message each other when they find something relevant, and the lead synthesizes everything at the end. More expensive, but the output is fundamentally different. When to use which: - Need a quick answer or a focused task done? Subagent. Fast, cheap, gets the job done. - Need multiple people looking at different angles of the same problem? Agent Team. The coordination is the point. - One person can do it without talking to anyone else? Subagent. The work benefits from debate, cross-checking, or parallel exploration? Agent Team. What our team is using it for right now: 1. Parallel code reviews — 3 teammates reviewing the same PR simultaneously. One on security, one on performance, one on test coverage. A single reviewer gravitates toward one issue type. Three specialists catch everything. 2. Competing hypotheses — 5 agents investigating the same bug, each with a different theory, actively trying to disprove each other. The theory that survives is almost always the root cause. 3. Cross-layer features — Frontend, backend, and tests each owned by a different teammate. No one steps on anyone else's work. Quick start if you want to try it: Requires Claude Code v2.1.32 or later. Add one environment variable to your settings.json: "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1" Tell Claude what kind of team you want in plain English. "Create a team with 3 teammates to review this project from different angles." Claude proposes the team, you confirm, and it handles the rest — spawning teammates, assigning tasks, coordinating work. Start with 3 teammates. Keep tasks independent. Don't let two teammates touch the same files. Still experimental. But this is the first multi-agent architecture I've seen actually hold up on real dev work. Link to docs in comments below.

English

122

11.2K

[email protected]@trydotworks·16h

@callmemabie @_catwu @RobertJBye don't build for the sake of building. polish and extend what's already there.

English

Ian Mabie@callmemabie·17h

@_catwu @RobertJBye How does the team think about balancing product velocity with the ability for consumers to figure out how to integrate what’s new into their daily lives? Feels like that becomes the limiting constraint if the team’s taste is on point and building isn’t the blocker?

English

4.7K

cat@_catwu·18h

The PM playbook was built on an assumption that the technology underneath your product is roughly stable With the current pace of model progress, this is no longer true. Here's how we've evolved the PM role:

English

147

1.7K

251.2K

[email protected]@trydotworks·16h

@cursor_ai could have played this as "we know how to turn open models into SOTA in 2 months". Instead it now looks like @Kimi_Moonshot is the winner, "our models are performant enough that 2 months of PT achieves SOTA'. All due to the fact that Cursor tried to obscure the model.

English

[email protected]@trydotworks·16h

Wait until people realize that model performance is constrained by human capacity to post-train and fine tune per use case.

English

[email protected]@trydotworks·17h

@ivanburazin @steipete what would the use case for openclaw in daytona look like? i did some quick calculations and it seems keeping it running 24/7 would cost $200 per month.

English

Ivan Burazin@ivanburazin·19h

Any chance we could get some help with this github.com/openclaw/openc… @steipete 🙏

English

1.9K

[email protected]@trydotworks·17h

@sanjaycodee @kimmonismus yeah must be impossible to come in and try to rebuild Microsoft and its products for Internet 3.0

English

Sanjay@sanjaycodee·1d

@kimmonismus the disillusionment narrative feels overplayed. having been there, most of the time these "failures" are just the brutal reality of integrating a moonshot team into a shipping product org.

English

159

Chubby♨️@kimmonismus·1d

Mustafa Suleyman and his team were hired by Microsoft for nearly $700 million to further develop Copilot for the future of AI. After two years, disillusionment set in, and Satya Nadella became increasingly dissatisfied. Alongside Meta, Microsoft remains arguably the biggest laggard among companies, despite its multi-billion dollar investments.

Pedro Domingos@pmddomingos

The inevitable has happened: Copilot no longer reports to Mustafa Suleyman. theinformation.com/briefings/micr…

English

644

103.9K

[email protected]@trydotworks·17h

@theo You absolutely should present that as supporting evidence and see what happens. Must.

English

Theo - t3.gg@theo·1d

According to Opus 4.6, T3 Code is compliant with the Anthropic TOS. This should hold up in court right?

English

1.4K

96.9K

[email protected]@trydotworks·17h

@elithrar Same. Kimi is my backup when Codex is out of quota. Does small and medium fine. For large I have to run double audits to fix all the implementation gaps. K3 should be a big step up, hopefully about GPT 5.2-ish.

English

Matt Silverlock 🐀@elithrar·1d

been using Kimi K2.5 for weeks now. we use it extensively for code review internally, docs & code review externally, and it’s often my daily driver for small-to-medium tasks. it is very, very quick.

michelle@_mchenco

kimi is here. workers ai is officially in the big model serving game blog.cloudflare.com/workers-ai-lar…

English

[email protected]@trydotworks·17h

@gailcweiner @ns123abc Hmm, how many applications have they launched...

English

Gail Weiner@gailcweiner·19h

@ns123abc Hasn’t been a good run for her at OpenAI has it ? 😏

English

NIK@ns123abc·20h

Sources tell me she’ll be gone in like a year lol

Fidji Simo@fidjissimo

Companies go through phases of exploration and phases of refocus; both are critical. But when new bets start to work, like we're seeing now with Codex, it's very important to double down on them and avoid distractions. Really glad we're seizing this moment.

English

379

63.7K

[email protected]@trydotworks·18h

@buildwithmati Yeah, Kimi K2.5

English

Mati 💻@buildwithmati·1d

Being using Composer 2 for the whole day and I’m surprised that I didn’t miss Opus 4.6 at all. Incredible cheaper and feels way faster too. Will still testing it for some days and different projects/stacks but looks really promising. Do I have a new favorite model? 🤔 Have you tried it out?

English

[email protected]@trydotworks·18h

@trishaepan there's no deal. they're just spinning this in a positive way in the short term. next model will have different licensing and I promise you the BD team is spinning up a new project with licensing, post-training support etc

English

trisha pan@trishaepan·19h

no wonder the 3 Moonshot employees (Moonshot made kimi) deleted their posts after accusing cursor of using kimi without attribution now the question is: did the deal/authorization happen before or after the leak??

Kimi.ai@Kimi_Moonshot

Congrats to the @cursor_ai team on the launch of Composer 2! We are proud to see Kimi-k2.5 provide the foundation. Seeing our model integrated effectively through Cursor's continued pretraining & high-compute RL training is the open model ecosystem we love to support. Note: Cursor accesses Kimi-k2.5 via @FireworksAI_HQ ' hosted RL and inference platform as part of an authorized commercial partnership.

English

2.6K

[email protected]@trydotworks·18h

@hbouammar @lateinteraction There is no need for a runtime. A skill does this better via documentation artifacts, phased gates and self audit. github.com/try-works/rlm-…

English

Haitham Bou Ammar@hbouammar·1d

Why is your 405B model losing to an 8B model? 📉"Context Rot" is a logic problem, not a scale problem. Based on the amazing work of RLMs, we built $\lambda$-RLM: replacing messy AI-generated code with a typed $\lambda$-Calculus runtime. The results: ✅ +21.9 accuracy gain ✅ 4.1x faster latency ✅ 8B models beating 405B We used the Y-combinator to "tie the knot" of recursion, giving LLMs formal guarantees on termination and cost. Stop scaling. Start structuring. Paper Link: github.com/lambda-calculu… Code Link: github.com/lambda-calculu… Enjoy!!! #AI #MachineLearning

English

162

14.1K

[email protected] รีทวีตแล้ว

twXBT@tradingwiener·19h

@leerob

QME

812

[email protected]@trydotworks·19h

@leerob these guys out here copypasting the same tweet as damage control x.com/amanrsanger/st…

Aman Sanger@amanrsanger

We've evaluated a lot of base models on perplexity-based evals and Kimi k2.5 proved to be the strongest! After that, we do continued pre-training and high-compute RL (a 4x scale-up). The combination of the strong base, CPT and RL, and Fireworks' inference and RL samplers make Composer-2 frontier level. It was a miss to not mention the Kimi base in our blog from the start. We'll fix that for the next model.

English

Lee Robinson@leerob·19h

I'm a big believer in open source, especially as AI improves. It was a miss to not mention the Kimi base in our blog from the start. We'll fix that for the next model 🙏 Their team clarified our usage was licensed in the tweet below. x.com/Kimi_Moonshot/…

Kimi.ai@Kimi_Moonshot

English

195

106

2.1K

328.6K

[email protected]@trydotworks·19h

@Reden799 @Presidentlin >Claude distills DeepSeek >NOOOOO NOOOOO YOU CAN'T DO WHAT WE WERE DOING TO YOU!

English

Reden@Reden799·22h

@Presidentlin >china steals everything mercilessly >china distills claude for their own training >cursor takes and uses an open source model (unenforceable license) >NOOOOO NOOOOO YOU CAN'T DO WHAT WE WERE DOING TO YOU! 🤡 (assuming it's even a real tweet)

English

1.3K

Lincoln 🇿🇦@Presidentlin·1d

Chinese open source is cancelled. Sorry everyone. Cursor didn't want to pay or give creds. They stold the IP. Anyway, Q1 and Q2 models are still coming down the pipe. Q3 and Q4 management are deciding on the best course of action. My sources in China tell me it doesn't look good. We have to go back to the MS Phi models.

English

336

65.7K

[email protected]@trydotworks·20h

@MehmoodOsman @Kimi_Moonshot @cursor_ai I'll probably stop using Codex when Kimi K3 is released

English

Osman Mehmood@MehmoodOsman·20h

@Kimi_Moonshot @cursor_ai

QME

939

[email protected]@trydotworks·20h

@tannerlinsley Kimi "Composer" 2.5 in OpenCode works well

English

Tanner Linsley@tannerlinsley·21h

Ghostty was fun, but time for something else. I still love opencode, too but with CC plans dead on it… I’m feeling lost. Full GUI? T3 Code? Opencode GUI? Warp? Back to cursor? Try CC again? Raw Codex? My 🧠 hurts and I just need to keep shipping.

English

387

1.1K

242.4K

ค้นพบ

@blocmates @grok @chetaslua @MiniMax_AI @xiongchun007 @JJEnglert @tenex_labs @AnthropicAI