Reasoning Models

347 posts

Reasoning Models

Reasoning Models

@reasoningmodels

Reasoning models are cool I guess. Thoughts and ideas about reasoning models from the wacky mind of @morganlinton.

Beigetreten Eylül 2024
519 Folgt271 Follower
Reasoning Models
Reasoning Models@reasoningmodels·
@iruletheworldmo I feel like it’s probably a solid analogy, just rather than lots of leaks, it’s the biggest leak ever
English
1
0
2
112
🍓🍓🍓
🍓🍓🍓@iruletheworldmo·
@reasoningmodels yeah i wasn’t sure if it worked but. hopefully it’s close enough haha
English
1
0
1
3.3K
🍓🍓🍓
🍓🍓🍓@iruletheworldmo·
🚨BREAKING FRONTIER MODEL NEWS claude mythos set for release april 16th dario has more leaks than the titanic, here’s some info from anthropic staff. >95 or higher on every single benchmark. except arc agi 3, yet to be tested on. >dramatically outperforms opus 4.6 on coding, reasoning, and cyber >anthropic privately warning government officials about its capabilities >so powerful they’re calling it “unprecedented cybersecurity risk” >already being tested with early access customers >priced at $120/$600 per million tokens >10 million token context window >enterprise use only capybara is here.​​​​​​​​​​​​​​​​ capygpt is agi.
English
185
117
2.3K
817.8K
Reasoning Models
Reasoning Models@reasoningmodels·
@0xSero What tool are you using to track usage that outputs it like that? Look interesting 👀
English
1
0
2
4.1K
0xSero
0xSero@0xSero·
6% used in an hour of use, 1 session. They massacred my boy, I feel like I'm in a Casino and Anthropic is the house. Claude Max 20x
0xSero tweet media
English
36
4
354
67.8K
Reasoning Models
Reasoning Models@reasoningmodels·
@SawyerMerritt Pretty incredible, generating $2B/mo, so $24B/year. So 35x annual revenue multiple. This is like a company doing $10M/year in ARR raising at a $350M valuation. Which I think does happen quite a bit, so maybe nothing too unusual here right??
English
1
0
0
921
Sawyer Merritt
Sawyer Merritt@SawyerMerritt·
NEWS: OpenAI just announced that it has officially closed their latest funding round with $122 billion in committed capital at a post money valuation of $852 billion. "We are now generating $2B in revenue per month. At this stage, we are growing revenue four times faster than the companies who defined the Internet and mobile eras, including Alphabet and Meta. ChatGPT has more than 900 million weekly active users, and over 50 million subscribers. Search usage has nearly tripled in a year, and our ads pilot reached more than $100 million in ARR in under six weeks. Momentum is just as strong on the enterprise side, which now makes up more than 40% of our revenue, and is on track to reach parity with consumer by the end of 2026. GPT‑5.4 is driving record engagement across agentic workflows. Our APIs now process more than 15 billion tokens per minute. Codex now serves over 2 million weekly users, up 5x in the past three months, with usage growing more than 70% month over month."
Sawyer Merritt tweet media
English
269
262
2.6K
1.9M
Yoonho Lee
Yoonho Lee@yoonholeee·
How can we autonomously improve LLM harnesses on problems humans are actively working on? Doing so requires solving a hard, long-horizon credit-assignment problem over all prior code, traces, and scores. Announcing Meta-Harness: a method for optimizing harnesses end-to-end
Yoonho Lee tweet media
English
74
261
1.6K
420.8K
Reasoning Models
Reasoning Models@reasoningmodels·
If you use Claude, and run out of tokens quickly, this could be why.
Alex Volkov@altryne

PSA: If you've been running out of Claude session quotas on Max tier, you're not alone. Read this. Some insane Redditor reverse engineered the Claude binaries with MITM to find 2 bugs that could have caused cache-invalidation. Tokens that aren't cached are 10x-20x more expensive and are killing your quota. If you're using your API keys with Claude this is even worse. This is also likely why this isn't uniform, while over 500 folks replied to me and said "me too", many (including me) didn't see this issue. There are 2 issues that are compounded here (per Redditor, I haven't independently confirmed this) : 1s bug he found is a string replacement bug in bun that invalidates cache. Apparently this has to do with the custom @bunjavascript binary that ships with standalone Claude CLI. The workaround there is to use Claude with `npx @anthropic-ai/claude-code` 2nd bug is worse, he claims that --resume always breaks cache. And there doesn't seem to be a workaround there, except pinning to a very old version (that will miss on tons of features) This bug is also documented on Github and confirmed by other folks. I won't entertain the conspiracy theories there that Anthropic "chooses" to ignore these bugs because it gets them more $$$, they are actively benefiting from everyone hitting as much cached tokens as possible, so this is absolutely a great find and it does align with my thoughts earlier. The very sudden spike in reporting for this, the non-uniform nature (some folks are completely fine, some folks are hitting quotas after saying "hey") definitely points to a bug. cc @trq212 @bcherny @_catwu for visibility in case this helps all of us.

English
0
0
1
40
Alex Volkov
Alex Volkov@altryne·
PSA: If you've been running out of Claude session quotas on Max tier, you're not alone. Read this. Some insane Redditor reverse engineered the Claude binaries with MITM to find 2 bugs that could have caused cache-invalidation. Tokens that aren't cached are 10x-20x more expensive and are killing your quota. If you're using your API keys with Claude this is even worse. This is also likely why this isn't uniform, while over 500 folks replied to me and said "me too", many (including me) didn't see this issue. There are 2 issues that are compounded here (per Redditor, I haven't independently confirmed this) : 1s bug he found is a string replacement bug in bun that invalidates cache. Apparently this has to do with the custom @bunjavascript binary that ships with standalone Claude CLI. The workaround there is to use Claude with `npx @anthropic-ai/claude-code` 2nd bug is worse, he claims that --resume always breaks cache. And there doesn't seem to be a workaround there, except pinning to a very old version (that will miss on tons of features) This bug is also documented on Github and confirmed by other folks. I won't entertain the conspiracy theories there that Anthropic "chooses" to ignore these bugs because it gets them more $$$, they are actively benefiting from everyone hitting as much cached tokens as possible, so this is absolutely a great find and it does align with my thoughts earlier. The very sudden spike in reporting for this, the non-uniform nature (some folks are completely fine, some folks are hitting quotas after saying "hey") definitely points to a bug. cc @trq212 @bcherny @_catwu for visibility in case this helps all of us.
Alex Volkov tweet media
Alex Volkov@altryne

My feed is showing me a bunch of folks who tapped out their whole usage limits on Mon/Tue. Is this your experience? Please comment, I want to understand how widespread this is

English
223
428
5K
1.6M
🍓🍓🍓
🍓🍓🍓@iruletheworldmo·
dario has closed the loop.
English
17
5
250
27.2K
Reasoning Models
Reasoning Models@reasoningmodels·
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 is a great model. Just make sure you're looking at v2. This is the one everyone is fanatical about right now, and for good reason.
Reasoning Models tweet media
English
1
0
0
76
Sebastian Raschka
It’s done. All chapters of Build A Reasoning Model (From Scratch) are now available in early access. The book is currently in production and should be out in the next months, including full-color print and syntax highlighting. There’s also a preorder up on Amazon.
Sebastian Raschka tweet media
English
123
261
2.5K
114.5K
Reasoning Models
Reasoning Models@reasoningmodels·
@cryptopunk7213 Kinda sad to watch honestly, but so it goes - big tech eats the world. Soon, everything we use and buy will be from Amazon and Google 🫠
English
0
0
0
16
Ejaaz
Ejaaz@cryptopunk7213·
lol google destroying duolingo and every language tutor with this 270 million people can now understand 70+ different languages in real-time via gemini AI audio translation when someone speaks to you in chinese you hear english. people spend fucking YEARS and $1000’s trying to get fluent in a language now ai removes the need to even learn one same shit happening with vibe-coders who don’t know code why learn a language when you can just have ai translate?
Ejaaz tweet media
Google@Google

Your headphones just became a personal translator in 70+ languages. 🎧✨ Google Translate’s “Live translate” with headphones is officially on iOS. We're also expanding this capability to more countries around the world for both @Android and iOS users. To try it, open the Translate app, tap “Live translate” and connect your headphones.

English
169
84
1.1K
295.3K
🍓🍓🍓
🍓🍓🍓@iruletheworldmo·
unless you work at a frontier ai lab. i promise you. you will be blown away by the progress made from both anthropic and openai every single researcher at both labs follow the delightful strawberry man. so be sure that i’ve seen the future. and it’s here.
English
49
20
818
50.2K
Reasoning Models
Reasoning Models@reasoningmodels·
@0xSero Thanks for sharing all the great stuff lately, I’d say it’s you and strawberry man that are my favs these days 🫶
English
0
0
0
83
0xSero
0xSero@0xSero·
Best models to run on your hardware level I'll be doing this every week, I hope you guys enjoy. ---- 8 GB ---- Autocomplete for coding (like Cursor Tab) - huggingface.co/NexVeridian/ze… - huggingface.co/bartowski/zed-… Tool calling, assistant style - huggingface.co/nvidia/NVIDIA-… ---- 16 Gb ---- Here things get better: Multimodal - huggingface.co/Qwen/Qwen3.5-9B - huggingface.co/Tesslate/OmniC… - huggingface.co/unsloth/Qwen3.… ---- 24 GB ---- - The best model you can get (thanks Qwen) huggingface.co/Qwen/Qwen3.5-2… - Great model (strong agents) huggingface.co/nvidia/Nemotro… - Mine hehe huggingface.co/0xSero/Qwen-3.… I'm doing a weekly series
English
221
375
3.7K
570.5K
Reasoning Models
Reasoning Models@reasoningmodels·
@karpathy Such a neat idea, and curious what LLM did you decide to pick for this? I’m assuming ChatGPT right??
English
0
0
0
17
Andrej Karpathy
Andrej Karpathy@karpathy·
- Drafted a blog post - Used an LLM to meticulously improve the argument over 4 hours. - Wow, feeling great, it’s so convincing! - Fun idea let’s ask it to argue the opposite. - LLM demolishes the entire argument and convinces me that the opposite is in fact true. - lol The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
English
1.7K
2.4K
31.1K
3.3M