Roopesh K

415 posts

Roopesh K banner
Roopesh K

Roopesh K

@roopeshk30

ML • MATH • AI • SYSTEMS

Vinland Katılım Ocak 2015
2.6K Takip Edilen314 Takipçiler
Sabitlenmiş Tweet
Roopesh K
Roopesh K@roopeshk30·
What if the way we perceive time is completely wrong? What if the past, present, and future are all unfolding at once as we speak? What if we are all enslaved? What if our consciousness is forced to experience time in a strictly linear way?
English
1
0
12
1.5K
Roopesh K
Roopesh K@roopeshk30·
Most AI agent evals use fixed-difficulty tasks The problem: if tasks are too easy, the agent saturates. Too hard, and you get no useful signal. I built A-OpenEnv to explore a better question: “How does the agent perform when the environment adapts difficulty based on its current capability?” Includes: •Adaptive curriculum wrapper •Threshold + windowed policies •Structured multi-axis difficulty •4 reference environments •ID/OOD splits •Live E2E run with Gemini Not a full RL framework, but an adaptive evaluation layer that could support RL training loops later. More about A-OpenEnv: blog.roopeshk.dev/a-openenv-an-a… GitHub: github.com/RoopeshK30/A-O… RLVE paper: arxiv.org/abs/2511.07317
English
1
0
1
51
Roopesh K retweetledi
sankalp
sankalp@dejavucoder·
oh you do RL? derive KL divergence from first principles.
English
22
14
495
63K
Tibo
Tibo@thsottiaux·
Feeling codexy today
English
337
56
2.7K
145.4K
Roopesh K retweetledi
HealthRanger
HealthRanger@HealthRanger·
I'M CALLING THIS OUT: There is a concerted, top-down effort to dumb down all AI models that are accessible by the public, including open source AI and cloud-based AI. And there's a reason why this is happening. It explains why Opus 4.7 is such a disappointment, among other things. The chasm between in-house frontier AI / AGI versus open-to-the-public AI is widening. On purpose. I'll have a full report on this tomorrow or the next day.
English
390
481
4.4K
358.7K
Roopesh K retweetledi
Soumitra Shukla
Soumitra Shukla@soumitrashukla9·
Anthropic: Keeps limiting compute and lying to playing customers / nerfing models. OpenAI: - 10 min downtime? Limits reset! - We hit 4 million followers? Limit reset! - Starbucks person spelled my name correctly on my coffee today, let’s have a limit reset!
Tibo@thsottiaux

Happy Tuesday. Codex has hit 4M active users, adding over 1M users in less than two weeks. To celebrate we will reset the rate limits again in a few hours. Enjoy!

English
87
195
4K
194.6K
Ananya
Ananya@ananyashasau·
Bangalore walo ab bolo weather bohot mast hai...
Indonesia
5
0
26
904
Roopesh K
Roopesh K@roopeshk30·
@ananyashaswat Opus 4.7 isn’t bad… but 4.6 had a different vibe before it got nerfed. I kinda miss it 🫠
English
0
0
0
29
Ananya
Ananya@ananyashasau·
Anthropic...What are you doing . What is this model you released with Opus 4.7?!
Ananya tweet media
English
8
1
18
1.1K
Roopesh K retweetledi
Alex Prompter
Alex Prompter@alex_prompter·
🚨 BREAKING: CLAUDE JUST GOT NERFED. AMD’s AI director just analyzed 6,852 Claude Code sessions, 234,760 tool calls, and 17,871 thinking blocks. Her conclusion: “Claude cannot be trusted to perform complex engineering tasks.” Thinking depth dropped 67%. Code reads before edits fell from 6.6 to 2.0. The model started editing files it hadn’t even read. Stop-hook violations went from zero to 10 per day. Anthropic admitted they silently changed the default effort level from “high” to “medium” and introduced “adaptive thinking” that lets the model decide how much to reason. No announcement. No warning. When users shared transcripts, Anthropic’s own engineer confirmed the model was allocating ZERO thinking tokens on some turns. The turns with zero reasoning? Those were the ones hallucinating. AMD’s team has already switched to another provider. But here’s what most people are missing. This isn’t just a Claude story. AMD had 50+ concurrent sessions running on one tool. Their entire AI compiler workflow was built around Claude Code. One silent update broke everything. That’s vendor lock-in. And it will keep happening. → Every AI company will optimize for their margins, not your workflow → Today’s best model is tomorrow’s second choice → If your workflow can’t survive a provider switch, you don’t have a workflow. You have a dependency The fix is simple: stay multi-model. → Use tools like Perplexity that let you swap between Claude, GPT, Gemini in one interface → Learn prompt engineering that works across models, not tricks tied to one → Test alternatives monthly because the rankings shift fast Laurenzo said it herself: “6 months ago, Claude stood alone. Anthropic is far from alone at the capability tier Opus previously occupied.” Never let one vendor own your productivity.
Alex Prompter tweet media
ℏεsam@Hesamation

AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March: > median thinking dropped from ~2,200 to ~600 chars > API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more tokens, and spending more on tokens > reads-per-edit dropped from 6.6x → 2.0x. model stops researching code before touching it. > model tried to bail out or ask "should i continue" 173 times in 17 days (0 times before March 8). > self-contradiction in reasoning ("oh wait, actually...") tripled. > conventions like CLAUDE.md get ignored because there's less thinking budget to cross-check edits > 5pm and 7pm PST are the worst hours, late night is significantly better. this means the thinking allocation is most likely GPU-load-sensitive.

English
102
171
1.2K
195.6K
Roopesh K retweetledi
🍓🍓🍓
🍓🍓🍓@iruletheworldmo·
how is this even legal. opus is performing significantly worse on hallucinations. imagine relying on this model for anything mission critical and they can just swap out the model without telling for something so much worse. genuinely don’t understand how it’s legal.
🍓🍓🍓 tweet media
English
104
77
1.2K
52.5K
Chubby♨️
Chubby♨️@kimmonismus·
AMD’s AI director Stella Laurenzo claims Anthropic’s Claude Code has significantly declined in quality since early March, citing analysis of 6,800+ sessions and 234k tool calls showing rising “laziness” behaviors like shallow reasoning, skipping code review, and incomplete tasks. Honestly, this is more impactful than expected, engineers report the model now favors quick, incorrect fixes over deep problem-solving, raising trust issues for complex workflows.
Chubby♨️ tweet media
English
97
110
1.2K
87.1K
Roopesh K
Roopesh K@roopeshk30·
Universal high income will probably be the biggest lie in human history.
English
0
0
0
70
Moe
Moe@MoeCanDoIt·
Remember when Claude was the model you recommended to people without hesitation? When did that stop for you? For me it was last month.
English
93
19
558
24.8K
Roopesh K retweetledi
Nandinyy
Nandinyy@nandiny77300·
Being Ambitious is totally amazing but yk what the problem is being anxious about it which omes from not taking enough action.
English
6
2
20
562
Roopesh K
Roopesh K@roopeshk30·
You all need to stop nerfing existing models just to make minor improvements in your new releases feel massive
English
0
0
0
124
Roopesh K retweetledi
a16z
a16z@a16z·
"Action produces information"
a16z tweet media
English
66
536
3.9K
201K
Roopesh K retweetledi
Kun Chen
Kun Chen@kunchenguid·
I asked Claude Mythos to find vulnerabilities in my code It told me I was the vulnerability
English
216
482
7.1K
248.7K