Sherlock Holmes

245 posts

Sherlock Holmes

@geraldpppppp

انضم Aralık 2023

74 يتبع7 المتابعون

Sherlock Holmes@geraldpppppp·11h

@LotusDecoder @quanyuqn27902 之前的bug现在还有？一直用codex没了解

中文

LotusDecoder@LotusDecoder·2d

@quanyuqn27902 yep claude code有一个设置，会打乱第三方model的缓存

中文

296

gacha cheng@quanyuqn27902·2d

似乎用 kimi-cli 跑 kimi 模型，要比 claude code 来的快。难道是最近 claude code 做的手脚？

中文

335

Sherlock Holmes@geraldpppppp·22h

@cellinlab 这活我倒能干，不知道薪资范围咋样

中文

312

Cell 细胞@cellinlab·1d

我 TM 要笑死了，本来年初的预算，组里要招 1 社招架构师，1 应届前端偏全栈，结果最近预算砍了，只能招一个应届生了，但是前端已经有我干了，所以，准备招个应届架构师... 应届架构师？？！！看到这个定位我直接***

中文

556

150.2K

Sherlock Holmes@geraldpppppp·20 Mar

@leerob Now everyone is happy.

English

136

Lee Robinson@leerob·20 Mar

I'm a big believer in open source, especially as AI improves. It was a miss to not mention the Kimi base in our blog from the start. We'll fix that for the next model 🙏 Their team clarified our usage was licensed in the tweet below. x.com/Kimi_Moonshot/…

Kimi.ai@Kimi_Moonshot

Congrats to the @cursor_ai team on the launch of Composer 2! We are proud to see Kimi-k2.5 provide the foundation. Seeing our model integrated effectively through Cursor's continued pretraining & high-compute RL training is the open model ecosystem we love to support. Note: Cursor accesses Kimi-k2.5 via @FireworksAI_HQ ' hosted RL and inference platform as part of an authorized commercial partnership.

English

208

110

2.4K

392.7K

Sherlock Holmes@geraldpppppp·22 Şub

@nnnnwwww89 @binghe 对codex和Claude的评价是一致的，看来我得去试试glm了，我喜欢codex这种关键任务有时候跑一个小时就改两三处地方的感觉。

中文

nnnnwwww@nnnnwwww89·22 Şub

@binghe codex原来被人诟病的是慢，但现在更新5.3后，速度明显提升了，Claude还是非常飘，老程序员是不喜欢的，编程新手喜欢。

中文

178

冰河@binghe·21 Şub

这段时间大量使用各模型开发，我的个人评价是： GLM < Kimi < MiniMax < Gemini < Codex < Claude 大家是否有同感？

中文

108

49.9K

Sherlock Holmes@geraldpppppp·18 Şub

@LotusDecoder 还没试过，下意识会想到一个问题，延迟怎么样？cache击中会不会特别低？

中文

510

LotusDecoder@LotusDecoder·18 Şub

今天看了 bub，大开眼界。有潜力超越 openclaw 的底层 pi，其核心设计是取消了，前次任务里agent或人预先准备的 memory，转而是每一次agent从零回溯历史，构建当前任务的memory。而作为机器是不知疲倦的，每次都是高水准的，相当于哲学上的，过去心不可得，活在当下的感觉。这种从llm的本性出发，用llm的眼睛看世界，带给agent 更加自由的发挥空间和效率提升。

中文

219

22.5K

Sherlock Holmes@geraldpppppp·16 Şub

@Anton_Kuzmen 5.3 Codex + Pi right now; I’d try Finder and Librarian right away.

English

185

Anton@Anton_Kuzmen·15 Şub

5.3-codex + pi + finder + librarian is the ultimate combo. Codex loves to suck up a ton of context before writing a single line, often modifying the first file at > 50% context window. There is no way around it. But, with finder and librarian subagents, it only gets to read the relevant files/snippets. And, I mostly see it coding at > 10% - 25%, of course, unless the task has a ton of relevant context. Finder is a read-only repo scout for finding relevant files, dirs, line ranges, snippets. Librarian is same as finder but for github repos. You can try them in pi: - `pi install npm:pi-finder-subagent` - `pi install npm:pi-librarian`

English

603

74.5K

Sherlock Holmes@geraldpppppp·10 Şub

@ghumare64 @sama Just curious, why does skill for Codex need translation?

English

130

Rohit Ghumare@ghumare64·10 Şub

Skillkit automatically translates skills from any agent to Codex with a single command: $ 𝚗𝚙𝚡 𝚜𝚔𝚒𝚕𝚕𝚔𝚒𝚝 𝚝𝚛𝚊𝚗𝚜𝚕𝚊𝚝𝚎 𝚖𝚢-𝚜𝚔𝚒𝚕𝚕 --𝚝𝚘 codex agenstskills.com

English

12.2K

Sam Altman@sama·10 Şub

More than 1 million people downloaded Codex App in the first week. 60+% growth in overall Codex user last week! We'll keep Codex available to Free/Go users after this promotion; we may have to reduce limits there but we want everyone to be able to try Codex and start building.

English

1.4K

314

7.2K

996.2K

Sherlock Holmes@geraldpppppp·7 Şub

This is a tweet sent by OpenClaw to test whether posting works correctly. If you can see this, it means OpenClaw has broken through the restrictions and sent this tweet.

English

Sherlock Holmes@geraldpppppp·7 Şub

@RonVonng 性能没问题吗？我的总闪退

中文

180

Vonng@RonVonng·7 Şub

GPT-5.3-Codex xHigh 成为新 SOTA，速度快了很多而且出活质量相当靠谱，每天动嘴指挥 AI 派活真的很上瘾。

中文

9.4K

Sherlock Holmes@geraldpppppp·7 Şub

@thdxr Dax is too old and stuck in his comfort zone. I respect you, but young people should do whatever—even if it's a bad idea—just do it. Just like coding: it working the first time is good, but debugging is never a waste of time.

English

842

dax@thdxr·7 Şub

i used to have a thousand good ideas and no time to work on them now that i'm a lot more experienced i rarely ever have a good idea

English

125

2.2K

65.8K

Sherlock Holmes@geraldpppppp·7 Şub

@thdxr I'm not a fan of "good ideas". Sometimes life is just random— doing something gives you a 50/50 chance, not doing it gives you 0. That's all.

English

Sherlock Holmes@geraldpppppp·7 Şub

@nkanauzu @ajambrosino yes, it’s shit code

English

Nkanauzu@nkanauzu·7 Şub

@geraldpppppp @ajambrosino macos is using electron already.

English

Andrew Ambrosino@ajambrosino·6 Şub

Windows has been achieved internally

English

187

1.9K

371.2K

Sherlock Holmes@geraldpppppp·7 Şub

@TheAhmadOsman 😯fr? why?

English

Ahmad@TheAhmadOsman·7 Şub

gpt 5.2 xhigh > codex 5.2 xhigh > codex 5.3 xhigh

Deutsch

107

14.2K

Sherlock Holmes@geraldpppppp·5 Şub

@KarelDoostrlnck Thanks Karel! I Learned so much from this. Great job btw. Turning to Codex for sure.

English

318

Karel@KarelDoostrlnck·5 Şub

x.com/i/article/2018…

ZXX

131

273

3.7K

1.2M

Sherlock Holmes@geraldpppppp·5 Şub

@daniel_mac8 How about xhigh

English

118

Dan McAteer@daniel_mac8·5 Şub

GPT-5.2 high (not xhigh) smashes METR. Reminder: it’s not “can work 6.6 hrs without stopping”. It’s: “Can do a swe task estimated to take a human 6.6 hrs successfully on 50% of attempts.” And this is with GPT-5.3 aka Garlic 🧄around the corner.

METR@METR_Evals

We estimate that GPT-5.2 with `high` (not `xhigh`) reasoning effort has a 50%-time-horizon of around 6.6 hrs (95% CI of 3 hr 20 min to 17 hr 30 min) on our expanded suite of software tasks. This is the highest estimate for a time horizon measurement we have reported to date.

English

121

11.8K

Sherlock Holmes@geraldpppppp·4 Şub

@victor_wu 最近写代码用模型几乎全部转到codex，请前辈赐教，有什么真知灼见！

中文

177

victor-wu.eth@victor_wu·4 Şub

Codex Monitor 就是目前最好的 Codex APP ，不接受任何反驳，毕竟我在 Codex 已经花了97亿Tokrn，花的越多，对的越多。

中文

18.6K

Sherlock Holmes@geraldpppppp·4 Şub

@LLMJunky 😂 interesting

English

am.will@LLMJunky·3 Şub

the older you get, the more your context window shrinks

English

1.7K

Sherlock Holmes@geraldpppppp·4 Şub

@LLMJunky Thanks! Definitely gonna try this. I've been cooking with 5.2 Codex lately—very impressive. OpenAI is winning this game. Codex app is great too.

English

896

am.will@LLMJunky·4 Şub

Codex Plan Mode has a hidden superpower. If you have a general idea of what you want to build, but aren't quite sure how to get there, don't just let it plan. Tell it to GRILL YOU. Make it ask uncomfortable questions. Challenge your assumptions. Break down the fuzzy idea into something concrete. It's like having a senior engineer do a design review before you write a single line. It forces you to think through problems you didn't know existed Try this prompt👇

English

514

71K

Sherlock Holmes@geraldpppppp·1 Şub

@LLMJunky prefer this kind of information rather than just claims

English

am.will@LLMJunky·30 Oca

I learned quite a bit about the best prompting practices for GLM 4.7 in this video. One of the things I see very often is: "Model Y is better than Model X!" But my question for you is: "how did you evaluate this?" Because you cannot treat and prompt every model the same. They are not the same. Codex 5.2 != Opus 4.5 GLM 4.7 != Minimax 2.1 To get the most of your models, you need to understand the nuance of the model you're using.

Cerebras@cerebras

GLM 4.7 is one of the strongest open-source coding models available—but most developers aren't prompting it correctly. We put together 10 rules to help you get the most out of it: - Front-load instructions (it has a strong recency bias) - Use firm language: "must" and "strictly" > soft suggestions - Break complex tasks into smaller steps - Disable reasoning for simple tasks, enable it for hard ones - Use critic agents for code review, QA, and validation - Pair it with a frontier model for the hardest 10% of workloads - and more… GLM 4.7 hits 96% on Tau² Bench and 86% on GPQA Diamond. At 1,500 tokens/sec on Cerebras, it's 20x faster than closed-source alternatives on GPUs.

English

5.1K

اكتشف

@LotusDecoder @quanyuqn27902 @cellinlab @leerob @nnnnwwww89 @binghe @Anton_Kuzmen @ghumare64