Ruirui Wan
167 posts



@MilksandMatcha @cerebras haven't renew my windsurf since the acquisition going, wanna try and see how it goes
English

Giving away 5 Windsurf Max ($200/month) plans
Each person will get 3 months of free Windsurf Max (highest tier). Try out SWE 1.6, Cognition's latest, fastest, and most intelligent model, powered by @cerebras.
Winners will be selected from comments in 48 hours, comment below why you want it.
Cognition@cognition
We’re releasing SWE-1.6, our best model in both intelligence & model UX. SWE-1.6 matches our Preview model on SWE-Bench Pro while dramatically improving on various behavioral axes. It’s available today in Windsurf in two modes: free tier (200 tok/s) and fast tier (950 tok/s).
English

@bytebot Cool, makes sense and thank you. We are working on some new product stuff, so stay tuned.
English

Today is my first, and also my last two days, using Cursor. Why? I bought a one-year Cursor membership in April 2025, but I barely used it. At the time, I had also subscribed to Cloud and Codex, and I never imagined they would become so popular.
As a result, I didn't use my Cursor membership much for a whole year. With only three days left until it expires, I remembered I had it. I wanted to try out the new Composer 2, especially since the $20 usage limit for Cursor Pro membership is so small. I figured $20 would only cover one or two tasks before running out.
So, I tried Cursor 2. I found the model to be quite fast, but it's completely incapable of handling tasks it's not specialized in. For example, today it spent about 20 minutes getting context and reviewing the surrounding text. However, its context window quickly filled up, leading to an endless loop of re-examining the context.
Its cache was also enormous, reaching millions, even though it only made minor code changes. I now feel it's just an average model. It's acceptable as a basic model for Cursor, but it truly can only perform specific tasks. It can't fully replace humans like Cloud or OpenAI can, especially when it comes to producing a 90-100% perfect result. It probably only achieves 60-70%. That's why I spent most of today finding, catching, and fixing bugs, which was quite frustrating. @cursor_ai

English

@CtrlAltDwayne @Zai_org and when you see 8token per second speed currently, I think they already maxed out all the usage right now for their gpu

English

@nzinfo @wan_ruirui @Zai_org Yeah that checks, the few people I know who did manage to snag subs there said it was on par with openrouter speed.
English

lmfao westerner tax from @Zai_org is crazy
The same "Max" plan served in China that costs $68 USD (469 RMB) is $160 USD for westerners. Irony is, if you have WeChat or Alipay, you can just buy the Chinese plan and still have API access. So it's not even GEOLOCK related costs. It's just capitalising on the gold rush right now for subscription-based OAuth integrations.
Diabolical is an understatement tbh

English

@LexnLin can i use deepthink with antigravity or only inside gemini?
English

感谢我们的125个用户
一周前,我还是个货比三家,到处找便宜中转的学生党用户,面对高昂的价格望而却步。一个偶然的机会,让我了解到了大多数中转站的成本价,其中巨大的价差不经让我思考,有没有可能自己搭建一个半公益性质的中转站,以接近成本价的价格对外销售?
抱着这样的想法,我在我的一台闲置的服务器上架设好了中转,绑定到了我的个人域名,并邀请了几个朋友前来使用。然而,一传十,十传百,越来越多的人想要加入我们:试开放注册的第一天,我们获得了40个用户。巨大的调用量一度使得来不及扩容的号池被烧干,我也因此不得不熬了好几个晚上来完善自动化机器人。
7天,125个用户,67563个请求,100亿调用量,我没想到我们会增长地这么快。一切的初衷,只是一个学生,为了解决自己的token需求,顺便方便一下朋友。不过无论如何,我都会尽量以当前能压到的最低价格给大家提供服务,并保持公开透明。口说无凭,一些措施正在路上,例如公开号池信息和成本,以保证中转站的公开透明。
维护站点真的很累,不过看到大家用AI实现了自己的想法,我觉得一切都值得了。欢迎进群吹水(QQ: 1093545322,也许可以捡到野生兑换码?!),或者购买我们的服务来支持一下站点❤感谢你看到这里!

芒果刀🥭@ThirteenYizuka
开放注册了:router.daoge.me 新注册送10刀额度🥰,不过为了防滥用目前暂时只允许qq和gmail邮箱注册🤔
中文

@0xSero What’s different running in droid and Claude code on glm5.1 what changed
English

Finally, cannot wait to test
Alex Albert@alexalbert__
We released Claude Opus 4.6 just two months ago. Today we're sharing some info on our new model, Claude Mythos Preview.
English

@pvncher @RepoPrompt Based on actual testing, Claude encounters many issues when calling the officially supported OpenAI Codex plugin, especially during large tasks haven’t tried with repo prompt yet
English

@wan_ruirui @RepoPrompt You can have Claude start a codex agent with the that command (plan and build workflow)!
Then built in agent has some benefits in terms of reviewing its own work and being more token efficient, so it’s nice to be able to spin one up.
Up to you on how you want to use it
English

Just released @RepoPrompt 2.1!
Really big update that makes it possible to use the RP Agent via MCP/CLI, and for the RP Agent to invoke sub agents!
Never been easier to have claude steer codex agents using RP tools to handle ambitious tasks efficiently.
Great for openclaw too!
English

Yes — Mercury felt extremely fast in practice, with very little noticeable latency. Part of that may be that it is not spending extra time on reasoning like some of the other models I tested. But while it is clearly much faster — close to 2x in my tests — I would not say its translation quality is the best. I have not done a controlled length-scaling benchmark yet, and I am still reading more about this architecture since it is so new. I want to test it more before making a stronger claim, but so far it does seem to stay very responsive as outputs get longer.
English

@wan_ruirui The Mercury 2 speed advantage isn't just optimization — it's architectural. It's a diffusion LLM: tokens generated in parallel, not sequentially. That gap should widen for longer outputs. Did you notice latency scaling differently vs the others as text length increased?
English

I have recently tested several newly released small language models. When balancing price and performance specifically for translation, I found that the top four options are:
1. Gemini 3.1 Flash Lite
2. Mercury 2
3. GPT-5.4 Nano
4. Mistral 4 Small
The first three are essentially the best-performing products in this category. There is also Mistral 4 small, but it is primarily chosen for its low cost, as its performance is relatively poor compared to the top three. This ranking is based on a mix of:
latency / real-world responsiveness
translation naturalness
idiom and tone handling
HTML / formatting preservation
output stability / empty-response rate
token efficiency
API pricing
From my tests:
Mercury 2 was the fastest overall at about 0.66–0.79s average latency, delivered 5/5 successful outputs, and had the best balance of speed, stability, and cost at $0.25/M input, $0.75/M output.
GPT-5.4 Nano was slightly slower at around 0.79s, but produced some of the most natural phrasing, especially on sarcasm, idioms, and social-style text. Its downside is a much higher $1.25/M output price.
Gemini 3.1 Flash Lite Preview had the strongest translation quality in some cases, especially for idioms, tone, and technical clarity, but it was slower at around 1.24s and also the most expensive on output at $1.50/M.
Mistral Small is still worth mentioning as a low-cost option, but it fell behind the top three in translation quality and formatting reliability, including weaker HTML preservation in my tests.
One surprising result: Qwen 3.5 was not competitive for real-time translation at all. Across the lineup, it often spent 97%–111% of usable output budget on reasoning, sometimes burned 500+ thinking tokens just to translate “Hello world,” and frequently returned empty outputs. In practice, that made it far too slow and unreliable for this use case.

English














