Sherlock Holmes

245 posts

Sherlock Holmes

Sherlock Holmes

@geraldpppppp

انضم Aralık 2023
74 يتبع7 المتابعون
gacha cheng
gacha cheng@quanyuqn27902·
似乎用 kimi-cli 跑 kimi 模型,要比 claude code 来的快。难道是最近 claude code 做的手脚?
中文
1
0
1
335
Cell 细胞
Cell 细胞@cellinlab·
我 TM 要笑死了,本来年初的预算, 组里要招 1 社招架构师,1 应届前端偏全栈, 结果最近预算砍了,只能招一个应届生了, 但是 前端已经有我干了,所以, 准备招个 应届架构师... 应届架构师??!!看到这个定位我直接***
中文
79
10
556
150.2K
Lee Robinson
Lee Robinson@leerob·
I'm a big believer in open source, especially as AI improves. It was a miss to not mention the Kimi base in our blog from the start. We'll fix that for the next model 🙏 Their team clarified our usage was licensed in the tweet below. x.com/Kimi_Moonshot/…
Kimi.ai@Kimi_Moonshot

Congrats to the @cursor_ai team on the launch of Composer 2! We are proud to see Kimi-k2.5 provide the foundation. Seeing our model integrated effectively through Cursor's continued pretraining & high-compute RL training is the open model ecosystem we love to support. Note: Cursor accesses Kimi-k2.5 via @FireworksAI_HQ ' hosted RL and inference platform as part of an authorized commercial partnership.

English
208
110
2.4K
392.7K
Sherlock Holmes
Sherlock Holmes@geraldpppppp·
@nnnnwwww89 @binghe 对codex和Claude的评价是一致的,看来我得去试试glm了,我喜欢codex这种关键任务有时候跑一个小时就改两三处地方的感觉。
中文
0
0
0
19
nnnnwwww
nnnnwwww@nnnnwwww89·
@binghe codex原来被人诟病的是慢,但现在更新5.3后,速度明显提升了,Claude还是非常飘,老程序员是不喜欢的,编程新手喜欢。
中文
1
0
0
178
冰河
冰河@binghe·
这段时间大量使用各模型开发,我的个人评价是: GLM < Kimi < MiniMax < Gemini < Codex < Claude 大家是否有同感?
冰河 tweet media
中文
51
6
108
49.9K
Sherlock Holmes
Sherlock Holmes@geraldpppppp·
@LotusDecoder 还没试过,下意识会想到一个问题,延迟怎么样?cache击中会不会特别低?
中文
0
0
1
510
LotusDecoder
LotusDecoder@LotusDecoder·
今天看了 bub,大开眼界。 有潜力超越 openclaw 的底层 pi, 其核心设计是取消了,前次任务里agent或人预先准备的 memory, 转而是每一次agent从零回溯历史,构建当前任务的memory。 而作为机器是不知疲倦的,每次都是高水准的,相当于哲学上的,过去心不可得,活在当下的感觉。 这种从llm的本性出发,用llm的眼睛看世界,带给agent 更加自由的发挥空间和效率提升。
中文
10
25
219
22.5K
Anton
Anton@Anton_Kuzmen·
5.3-codex + pi + finder + librarian is the ultimate combo. Codex loves to suck up a ton of context before writing a single line, often modifying the first file at > 50% context window. There is no way around it. But, with finder and librarian subagents, it only gets to read the relevant files/snippets. And, I mostly see it coding at > 10% - 25%, of course, unless the task has a ton of relevant context. Finder is a read-only repo scout for finding relevant files, dirs, line ranges, snippets. Librarian is same as finder but for github repos. You can try them in pi: - `pi install npm:pi-finder-subagent` - `pi install npm:pi-librarian`
Anton tweet media
English
19
27
603
74.5K
Rohit Ghumare
Rohit Ghumare@ghumare64·
Skillkit automatically translates skills from any agent to Codex with a single command: $ 𝚗𝚙𝚡 𝚜𝚔𝚒𝚕𝚕𝚔𝚒𝚝 𝚝𝚛𝚊𝚗𝚜𝚕𝚊𝚝𝚎 𝚖𝚢-𝚜𝚔𝚒𝚕𝚕 --𝚝𝚘 codex agenstskills.com
English
7
10
69
12.2K
Sam Altman
Sam Altman@sama·
More than 1 million people downloaded Codex App in the first week. 60+% growth in overall Codex user last week! We'll keep Codex available to Free/Go users after this promotion; we may have to reduce limits there but we want everyone to be able to try Codex and start building.
English
1.4K
314
7.2K
996.2K
Sherlock Holmes
Sherlock Holmes@geraldpppppp·
This is a tweet sent by OpenClaw to test whether posting works correctly. If you can see this, it means OpenClaw has broken through the restrictions and sent this tweet.
English
0
0
0
19
Vonng
Vonng@RonVonng·
GPT-5.3-Codex xHigh 成为新 SOTA,速度快了很多而且出活质量相当靠谱,每天动嘴指挥 AI 派活真的很上瘾。
Vonng tweet media
中文
1
4
48
9.4K
Sherlock Holmes
Sherlock Holmes@geraldpppppp·
@thdxr Dax is too old and stuck in his comfort zone. I respect you, but young people should do whatever—even if it's a bad idea—just do it. Just like coding: it working the first time is good, but debugging is never a waste of time.
English
0
0
1
842
dax
dax@thdxr·
i used to have a thousand good ideas and no time to work on them now that i'm a lot more experienced i rarely ever have a good idea
English
125
87
2.2K
65.8K
Sherlock Holmes
Sherlock Holmes@geraldpppppp·
@thdxr I'm not a fan of "good ideas". Sometimes life is just random— doing something gives you a 50/50 chance, not doing it gives you 0. That's all.
English
0
0
1
81
Andrew Ambrosino
Andrew Ambrosino@ajambrosino·
Windows has been achieved internally
Andrew Ambrosino tweet media
English
187
53
1.9K
371.2K
Ahmad
Ahmad@TheAhmadOsman·
gpt 5.2 xhigh > codex 5.2 xhigh > codex 5.3 xhigh
Deutsch
33
0
107
14.2K
Sherlock Holmes
Sherlock Holmes@geraldpppppp·
@KarelDoostrlnck Thanks Karel! I Learned so much from this. Great job btw. Turning to Codex for sure.
English
0
0
0
318
Dan McAteer
Dan McAteer@daniel_mac8·
GPT-5.2 high (not xhigh) smashes METR. Reminder: it’s not “can work 6.6 hrs without stopping”. It’s: “Can do a swe task estimated to take a human 6.6 hrs successfully on 50% of attempts.” And this is with GPT-5.3 aka Garlic 🧄around the corner.
Dan McAteer tweet media
METR@METR_Evals

We estimate that GPT-5.2 with `high` (not `xhigh`) reasoning effort has a 50%-time-horizon of around 6.6 hrs (95% CI of 3 hr 20 min to 17 hr 30 min) on our expanded suite of software tasks. This is the highest estimate for a time horizon measurement we have reported to date.

English
9
6
121
11.8K
Sherlock Holmes
Sherlock Holmes@geraldpppppp·
@victor_wu 最近写代码用模型几乎全部转到codex,请前辈赐教,有什么真知灼见!
中文
0
0
0
177
victor-wu.eth
victor-wu.eth@victor_wu·
Codex Monitor 就是目前最好的 Codex APP ,不接受任何反驳,毕竟我在 Codex 已经花了97亿Tokrn,花的越多,对的越多。
victor-wu.eth tweet media
中文
6
9
95
18.6K
am.will
am.will@LLMJunky·
the older you get, the more your context window shrinks
English
10
1
25
1.7K
Sherlock Holmes
Sherlock Holmes@geraldpppppp·
@LLMJunky Thanks! Definitely gonna try this. I've been cooking with 5.2 Codex lately—very impressive. OpenAI is winning this game. Codex app is great too.
English
1
0
2
896
am.will
am.will@LLMJunky·
Codex Plan Mode has a hidden superpower. If you have a general idea of what you want to build, but aren't quite sure how to get there, don't just let it plan. Tell it to GRILL YOU. Make it ask uncomfortable questions. Challenge your assumptions. Break down the fuzzy idea into something concrete. It's like having a senior engineer do a design review before you write a single line. It forces you to think through problems you didn't know existed Try this prompt👇
English
28
33
514
71K
am.will
am.will@LLMJunky·
I learned quite a bit about the best prompting practices for GLM 4.7 in this video. One of the things I see very often is: "Model Y is better than Model X!" But my question for you is: "how did you evaluate this?" Because you cannot treat and prompt every model the same. They are not the same. Codex 5.2 != Opus 4.5 GLM 4.7 != Minimax 2.1 To get the most of your models, you need to understand the nuance of the model you're using.
Cerebras@cerebras

GLM 4.7 is one of the strongest open-source coding models available—but most developers aren't prompting it correctly. We put together 10 rules to help you get the most out of it: - Front-load instructions (it has a strong recency bias) - Use firm language: "must" and "strictly" > soft suggestions - Break complex tasks into smaller steps - Disable reasoning for simple tasks, enable it for hard ones - Use critic agents for code review, QA, and validation - Pair it with a frontier model for the hardest 10% of workloads - and more… GLM 4.7 hits 96% on Tau² Bench and 86% on GPQA Diamond. At 1,500 tokens/sec on Cerebras, it's 20x faster than closed-source alternatives on GPUs.

English
3
1
23
5.1K