29 posts

Om

@whyomwhy

Touching Grass

Grass เข้าร่วม Şubat 2025

41 กำลังติดตาม0 ผู้ติดตาม

Om@whyomwhy·3d

@theo Hey Theo, have you found a benchmark of how different harnesses perform with Fable? I haven't found anything useful so far and it's too expensive to independently find out lol

English

3.9K

Theo - t3.gg@theo·4d

If Anthropic put out a $1000/month tier that gets 5x the $200 tier limits and also lets us keep Fable access, I'd do it in a heartbeat.

English

375

3.4K

739.3K

Om@whyomwhy·9 Haz

@jmvdotdev @theo @sypen231984 @cognition I’m sorry what? What does 1x worse mean, isn’t 1x the same exact thing

English

Jorge Vieira@jmvdotdev·9 Haz

I think it’s possible. If we say 4.7 was at least 1x worse than 4.6 and 4.6 1.5x worse than 4.5, then 4.8 would be what, about 1.6x better than 4.5? Sounds about right. Then again comparatively to OpenAI models the numbers here wouldn’t make sense, but who knows, with 40hr runs it could be noise from the context compaction performance.

English

834

Theo - t3.gg@theo·9 Haz

Fascinating bench. Really like the idea of focusing on mergeability. Confused how Opus 4.8 is 2.5x better than Opus 4.7 though 🙃

Cognition@cognition

Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers. Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?

English

147.8K

Om@whyomwhy·6 Haz

@benjamin_horne Ironic coming from the lead Ghostwood estate developer

English

Ben Horne@benjamin_horne·4 Haz

Dario this morning: “Ok guys, here’s how it’s gonna work. We’re gonna tweet some scary ominous nonsense from our main account. Then, in the hours that follow, everyone quote tweet it and double down on the fearmongering. Coordinated marketing bullshit like this is how we can keep our valuation up before the IPO. Only need to do it a few more times after this, I promise. Ok, everyone ready?? Break!”

Stephen McAleer@McaleerStephen

We need to figure out how to have the option for a coordinated slowdown in the face of recursive self-improvement.

English

247

37.5K

Om@whyomwhy·28 May

I’ve been getting in LoTR recently and I really think we should call people suffering from AI psychosis as the cult of Denethor Alternatively doomer AI posting could be denethorposting

English

Om@whyomwhy·30 Nis

@theo This video felt like what Tucker Carlson said about Trump after the Iran war

English

395

Theo - t3.gg@theo·30 Nis

Github got me where I am today. That's why it's so hard to watch it die.

English

545

59.2K

Om@whyomwhy·8 Nis

@a24 please help me guys, I just saw the drama and can’t figure out who played the gym teacher for the love of god

English

Om@whyomwhy·29 Eki

@tomcrawshaw01 SKILL

Svenska

Tom@tomcrawshaw01·27 Eki

Writing viral posts will never be a struggle again... I packaged every context file, example, and framework that got me 4.5M+ impressions into one Claude skill. Just upload it to your project and Claude instantly references: - 20+ tweets with 100K+ views each - Proven hook formulas - Writing principles that actually work - Real feedback on what converts (and what flops) Most people prompt Claude with zero context and wonder why their posts sound generic. They're basically asking it to write blind. This skill changes that completely. You upload it once to your Claude account and it becomes part of Claude's memory. Now when you say "write a post about X," Claude pulls from battle-tested patterns. - It knows which hooks are overused and market-fatigued. - It understands how to structure posts for maximum engagement. - It references actual examples that crushed it. The skill teaches Claude to write posts that actually convert, and it never runs out of fresh angles. Claude gets better the more you use it together, because you can keep adding to the skill over time. This is the new way to add context to AI. No more copy-pasting examples into every chat. No more re-explaining your style from scratch. Follow + comment "SKILL" and I'll DM you the file

English

1.8K

154

1.9K

280.5K

Om รีทวีตแล้ว

Michael Qizhe Shieh@michaelqshieh·25 Ağu

Introducing MCPMark, a collaboration with @EvalSysOrg and @lobehub! We created a challenging benchmark to stress-test MCP use in comprehensive contexts. - 127 high-quality data samples created by experts. - GPT-5 takes the current lead and achieves a Pass@1 of 46.96% while the other models fall in the range of 10-30%. - Diverse test cases on Notion, Github, Filesystem, Playwright (browser), and Postgres. 9🧵s ahead

English

170

161.2K

Om@whyomwhy·15 Ağu

Every day I wake up full of gratitude that I don't have to use Microsoft Teams

james hawkins@james406

mr. beast says he plans to make 30 people use microsoft teams for 1 whole calendar year, with the last person standing taking home $1 million

English

Om@whyomwhy·7 Ağu

You can press the Allow button in the auth screen tab and you should be gtg

English

Om@whyomwhy·7 Ağu

Fetch the localhost port in the link, and run the following command on your local machine (not server): ssh -L <port>:localhost:<port> user@<instance_ip>

English

Om@whyomwhy·7 Ağu

Claude Code SDK has been out for a while, and it has been really interesting to use it for automation. But connecting to remote MCPs on servers is a hassle due to browser clients needed for authentication

English