Om
29 posts


@jmvdotdev @theo @sypen231984 @cognition I’m sorry what? What does 1x worse mean, isn’t 1x the same exact thing
English

I think it’s possible. If we say 4.7 was at least 1x worse than 4.6 and 4.6 1.5x worse than 4.5, then 4.8 would be what, about 1.6x better than 4.5? Sounds about right. Then again comparatively to OpenAI models the numbers here wouldn’t make sense, but who knows, with 40hr runs it could be noise from the context compaction performance.
English

Fascinating bench. Really like the idea of focusing on mergeability.
Confused how Opus 4.8 is 2.5x better than Opus 4.7 though 🙃
Cognition@cognition
Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers. Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?
English

Dario this morning:
“Ok guys, here’s how it’s gonna work. We’re gonna tweet some scary ominous nonsense from our main account.
Then, in the hours that follow, everyone quote tweet it and double down on the fearmongering.
Coordinated marketing bullshit like this is how we can keep our valuation up before the IPO. Only need to do it a few more times after this, I promise.
Ok, everyone ready?? Break!”
Stephen McAleer@McaleerStephen
We need to figure out how to have the option for a coordinated slowdown in the face of recursive self-improvement.
English

Writing viral posts will never be a struggle again...
I packaged every context file, example, and framework that got me 4.5M+ impressions into one Claude skill.
Just upload it to your project and Claude instantly references:
- 20+ tweets with 100K+ views each
- Proven hook formulas
- Writing principles that actually work
- Real feedback on what converts (and what flops)
Most people prompt Claude with zero context and wonder why their posts sound generic.
They're basically asking it to write blind.
This skill changes that completely.
You upload it once to your Claude account and it becomes part of Claude's memory.
Now when you say "write a post about X," Claude pulls from battle-tested patterns.
- It knows which hooks are overused and market-fatigued.
- It understands how to structure posts for maximum engagement.
- It references actual examples that crushed it.
The skill teaches Claude to write posts that actually convert, and it never runs out of fresh angles.
Claude gets better the more you use it together, because you can keep adding to the skill over time.
This is the new way to add context to AI.
No more copy-pasting examples into every chat.
No more re-explaining your style from scratch.
Follow + comment "SKILL" and I'll DM you the file

English
Om รีทวีตแล้ว

Introducing MCPMark, a collaboration with @EvalSysOrg and @lobehub!
We created a challenging benchmark to stress-test MCP use in comprehensive contexts.
- 127 high-quality data samples created by experts.
- GPT-5 takes the current lead and achieves a Pass@1 of 46.96% while the other models fall in the range of 10-30%.
- Diverse test cases on Notion, Github, Filesystem, Playwright (browser), and Postgres.
9🧵s ahead

English

Every day I wake up full of gratitude that I don't have to use Microsoft Teams
james hawkins@james406
mr. beast says he plans to make 30 people use microsoft teams for 1 whole calendar year, with the last person standing taking home $1 million
English
Om รีทวีตแล้ว

Our open models are here.
Both of them.
openai.com/open-models
English

@BorisMPower Hoping it can finally break the Claude monopoly on Coding use-cases
English



