Roopesh K

415 posts

Roopesh K

@roopeshk30

ML • MATH • AI • SYSTEMS

Vinland Katılım Ocak 2015

2.6K Takip Edilen314 Takipçiler

Sabitlenmiş Tweet

Roopesh K@roopeshk30·11 Mar

What if the way we perceive time is completely wrong? What if the past, present, and future are all unfolding at once as we speak? What if we are all enslaved? What if our consciousness is forced to experience time in a strictly linear way?

English

1.5K

Roopesh K@roopeshk30·11h

Most AI agent evals use fixed-difficulty tasks The problem: if tasks are too easy, the agent saturates. Too hard, and you get no useful signal. I built A-OpenEnv to explore a better question: “How does the agent perform when the environment adapts difficulty based on its current capability?” Includes: •Adaptive curriculum wrapper •Threshold + windowed policies •Structured multi-axis difficulty •4 reference environments •ID/OOD splits •Live E2E run with Gemini Not a full RL framework, but an adaptive evaluation layer that could support RL training loops later. More about A-OpenEnv: blog.roopeshk.dev/a-openenv-an-a… GitHub: github.com/RoopeshK30/A-O… RLVE paper: arxiv.org/abs/2511.07317

English

Roopesh K retweetledi

sankalp@dejavucoder·2 May

oh you do RL? derive KL divergence from first principles.

English

495

63K

Roopesh K@roopeshk30·30 Nis

@thsottiaux Another reset coming?

English

Tibo@thsottiaux·30 Nis

Feeling codexy today

English

337

2.7K

145.4K

Roopesh K retweetledi

HealthRanger@HealthRanger·22 Nis

I'M CALLING THIS OUT: There is a concerted, top-down effort to dumb down all AI models that are accessible by the public, including open source AI and cloud-based AI. And there's a reason why this is happening. It explains why Opus 4.7 is such a disappointment, among other things. The chasm between in-house frontier AI / AGI versus open-to-the-public AI is widening. On purpose. I'll have a full report on this tomorrow or the next day.

English

390

481

4.4K

358.7K

Roopesh K@roopeshk30·22 Nis

@OpenAI with a big W

Tibo@thsottiaux

I don't know what they are doing over there, but Codex will continue to be available both in the FREE and PLUS ($20) plans. We have the compute and efficient models to support it. For important changes, we will engage with the community well ahead of making them. Transparency and trust are two principles we will not break, even if it means momentarily earning less. A reminder that you vote with your subscription for the values you want to see in this world.

English

Roopesh K retweetledi

Soumitra Shukla@soumitrashukla9·21 Nis

Anthropic: Keeps limiting compute and lying to playing customers / nerfing models. OpenAI: - 10 min downtime? Limits reset! - We hit 4 million followers? Limit reset! - Starbucks person spelled my name correctly on my coffee today, let’s have a limit reset!

Tibo@thsottiaux

Happy Tuesday. Codex has hit 4M active users, adding over 1M users in less than two weeks. To celebrate we will reset the rate limits again in a few hours. Enjoy!

English

195

194.6K

Roopesh K@roopeshk30·17 Nis

@ananyashaswat A cloudy day in mid April beats every city in India

English

Ananya@ananyashasau·17 Nis

Bangalore walo ab bolo weather bohot mast hai...

Indonesia

904

Roopesh K@roopeshk30·17 Nis

@ananyashaswat Opus 4.7 isn’t bad… but 4.6 had a different vibe before it got nerfed. I kinda miss it 🫠

English

Ananya@ananyashasau·17 Nis

Anthropic...What are you doing . What is this model you released with Opus 4.7?!

English

1.1K

Roopesh K retweetledi

Alex Prompter@alex_prompter·12 Nis

🚨 BREAKING: CLAUDE JUST GOT NERFED. AMD’s AI director just analyzed 6,852 Claude Code sessions, 234,760 tool calls, and 17,871 thinking blocks. Her conclusion: “Claude cannot be trusted to perform complex engineering tasks.” Thinking depth dropped 67%. Code reads before edits fell from 6.6 to 2.0. The model started editing files it hadn’t even read. Stop-hook violations went from zero to 10 per day. Anthropic admitted they silently changed the default effort level from “high” to “medium” and introduced “adaptive thinking” that lets the model decide how much to reason. No announcement. No warning. When users shared transcripts, Anthropic’s own engineer confirmed the model was allocating ZERO thinking tokens on some turns. The turns with zero reasoning? Those were the ones hallucinating. AMD’s team has already switched to another provider. But here’s what most people are missing. This isn’t just a Claude story. AMD had 50+ concurrent sessions running on one tool. Their entire AI compiler workflow was built around Claude Code. One silent update broke everything. That’s vendor lock-in. And it will keep happening. → Every AI company will optimize for their margins, not your workflow → Today’s best model is tomorrow’s second choice → If your workflow can’t survive a provider switch, you don’t have a workflow. You have a dependency The fix is simple: stay multi-model. → Use tools like Perplexity that let you swap between Claude, GPT, Gemini in one interface → Learn prompt engineering that works across models, not tricks tied to one → Test alternatives monthly because the rankings shift fast Laurenzo said it herself: “6 months ago, Claude stood alone. Anthropic is far from alone at the capability tier Opus previously occupied.” Never let one vendor own your productivity.

ℏεsam@Hesamation

AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March: > median thinking dropped from ~2,200 to ~600 chars > API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more tokens, and spending more on tokens > reads-per-edit dropped from 6.6x → 2.0x. model stops researching code before touching it. > model tried to bail out or ask "should i continue" 173 times in 17 days (0 times before March 8). > self-contradiction in reasoning ("oh wait, actually...") tripled. > conventions like CLAUDE.md get ignored because there's less thinking budget to cross-check edits > 5pm and 7pm PST are the worst hours, late night is significantly better. this means the thinking allocation is most likely GPU-load-sensitive.

English

102

171

1.2K

195.6K

Roopesh K retweetledi

🍓🍓🍓@iruletheworldmo·12 Nis

how is this even legal. opus is performing significantly worse on hallucinations. imagine relying on this model for anything mission critical and they can just swap out the model without telling for something so much worse. genuinely don’t understand how it’s legal.

English

104

1.2K

52.5K

Roopesh K@roopeshk30·11 Nis

@kimmonismus x.com/roopeshk30/sta…

Roopesh K@roopeshk30

You all need to stop nerfing existing models just to make minor improvements in your new releases feel massive

QME

Chubby♨️@kimmonismus·11 Nis

AMD’s AI director Stella Laurenzo claims Anthropic’s Claude Code has significantly declined in quality since early March, citing analysis of 6,800+ sessions and 234k tool calls showing rising “laziness” behaviors like shallow reasoning, skipping code review, and incomplete tasks. Honestly, this is more impactful than expected, engineers report the model now favors quick, incorrect fixes over deep problem-solving, raising trust issues for complex workflows.