Steven Roosa

4.4K posts

Steven Roosa

@StevenRoosa

Law. Linux. Python. Read, Write.

PA and NY Katılım Haziran 2011

236 Takip Edilen240 Takipçiler

Steven Roosa retweetledi

ℏεsam@Hesamation·7h

this is not a new argument btw. Lex Fridman asked Dario Amodei the same question, "is Claude getting dumber?" back in 2024 for Sonnet 3.5. It's just surprising there's still not an open, transparent daily/weekly performance report of Codex & Claude similar to uptime services that evaluate them based on some verifiable measurements or public benchmarks. it's the only way to put an end to this debate once and for all, after more than 3 years of having LLMs around.

ℏεsam@Hesamation

AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March: > median thinking dropped from ~2,200 to ~600 chars > API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more tokens, and spending more on tokens > reads-per-edit dropped from 6.6x → 2.0x. model stops researching code before touching it. > model tried to bail out or ask "should i continue" 173 times in 17 days (0 times before March 8). > self-contradiction in reasoning ("oh wait, actually...") tripled. > conventions like CLAUDE.md get ignored because there's less thinking budget to cross-check edits > 5pm and 7pm PST are the worst hours, late night is significantly better. this means the thinking allocation is most likely GPU-load-sensitive.

English

230

33.4K

Steven Roosa retweetledi

Santiago@svpino·9h

It's become noticeably easier to recognize AI-slop replies here on this platform because they all follow the same patterns. It's like a template that all models use when writing. Same phrasing, same style. This goes well beyond overusing emojis or em dashes. A big part of this is that labs aren't focusing on making these models better writers or on building harnesses that produce good prose. Obviously, the money is somewhere else: coding agents.

Gergely Orosz@GergelyOrosz

I use AI a lot for deep research and summarization. One thing I'm noticing across all models (Claude, ChatGPT, Gemini) is how they are becoming... more generic? More "AI-templated" in writing? Lazier? (Using the same tired phrases again and again) As the models supposedly get better, I subjectively feel they are the same or worse in this area.

English

8.6K

Steven Roosa retweetledi

ℏεsam@Hesamation·10h

English

228

694

6.3K

1.4M

Steven Roosa retweetledi

ℏεsam@Hesamation·1 Nis

x.com/i/article/2039…

ZXX

125

750

71.9K

Steven Roosa@StevenRoosa·29 Mar

@Hesamation Or maybe there is too much fundamental randomness in the market to be able to predict price moves and action trades.

English

ℏεsam@Hesamation·28 Mar

bro created an ai crypto trading bot using > Karpathy’s autorrsearch > $200 of budget > last 3 years trading signals > the ability to buy its own compute THE RESULT: didn’t perform well. pulling this off requires a massive token budget that only big hedge funds can afford. most X posts you see of people turning $100 to $1000 lack the evidence or are an advertisement to sell their bots.

English

353

37K

Steven Roosa retweetledi

Liquidity Goblin@liquiditygoblin·28 Mar

In an effort to try stop seeing so much slop I've been trying to train my own AI detection model. Found something incredibly interesting. for the most part LLM generated text and human written text are linearly separable.

English

195

147

5.5K

526.1K

Steven Roosa@StevenRoosa·27 Mar

@KobeissiLetter sus...

The Kobeissi Letter@KobeissiLetter·10 Oca

ANNOUNCEMENT: We are excited to present The Kobeissi Letter’s 2025 performance report. Our analysis provided a net return of +30.7% in 2025, outperforming the S&P 500's 2025 return of +16.4%. Since 2020, our investment strategy has now returned +516.3%, significantly outperforming the S&P 500’s +111.9% return over the same period. Our 2025 return builds on +8.1% in 2024, +15.5% in 2023, +92.8% in 2022, +35.3% in 2021, and +44.8% in 2020. Since 2020, our analysis has delivered a Compound Annual Growth Rate (CAGR) of +35.4%, outperforming the S&P 500’s +13.3% CAGR. Read the full annual report here: thekobeissiletter.com/performance

English

348

407

6.2M

Steven Roosa@StevenRoosa·12 Mar

@tunguz Dumb

English

Bojan Tunguz@tunguz·10 Mar

Strongly agree with each one of those points.

Prof. Brian Keating@DrBrianKeating

Legendary Professor @AswathDamodaran (@NYU) just said the quiet part out loud: • 95% of academic research is pointless • Your degree is an overvalued asset • Books over 200 pages aren't worth reading • ChatGPT can already do what investment bankers do • Every asset class is overpriced Why are universities, markets & finance all running the same broken algorithm? Video: youtube.com/watch?v=iGEMmb…

English

19.3K

Steven Roosa@StevenRoosa·8 Mar

@tunguz It's the tool for downloading Chrome on Ubuntu.

English

Bojan Tunguz@tunguz·8 Mar

OK, I'll bite: what's Firefox?

English

8.5K

Steven Roosa@StevenRoosa·7 Mar

@aakashgupta People will always do stupid stuff.

English

Aakash Gupta@aakashgupta·7 Mar

2023: a lawyer used ChatGPT citations in court. Fined $5,000. 2025: a Chicago Housing Authority lawyer did the same thing. Firm sanctioned $60,000. 2026: ChatGPT told a woman her real lawyer was gaslighting her, convinced her to fire him, then filed 60 documents in federal court on her behalf. The other side spent $300,000 defending a case that was already settled. OpenAI is now being sued for $10 million. Notice the pattern. The first two cases were lawyers using AI as a research shortcut and getting sloppy. This one is different. The AI wasn’t assisting a lawyer. It was operating as one. Graciela Dela Torre had a disability claim from a 2019 workplace injury. She settled it. Signed a full release. Case dismissed with prejudice. When she tried to reopen it a year later, her attorney told her the release was enforceable. So she uploaded his response to ChatGPT and asked if she was being gaslighted. ChatGPT said yes. Then it did what a bad therapist does. It validated the emotion instead of assessing the situation. It told her what she wanted to hear and started generating the legal strategy to act on it. Motions, arguments, research, filings. One cited a case that exists nowhere except ChatGPT’s output and her court papers. ChatGPT scored 297 on the bar exam. It can produce formatting that looks indistinguishable from real legal work. And it will never say “I don’t know” or “you should stop.” The people most exposed to this are the ones who already feel failed by the system and want something to tell them they’re right. Sixty documents and $300,000 in damage later, the question sitting in federal court in Illinois isn’t whether AI can practice law. It’s who pays when it does.

Polymarket@Polymarket

JUST IN: Lawsuit claims ChatGPT pretended to be a lawyer and persuaded a woman into firing her real attorney while citing fake case law.

English

134

46.3K

Steven Roosa@StevenRoosa·7 Mar

@tunguz Sanctum

English

Bojan Tunguz@tunguz·7 Mar

What's a good movie to watch on a rainy Friday evening?

English

19.4K

Steven Roosa@StevenRoosa·5 Mar

@SteveSkojec @shanaka86 It's likely AI generated, based on those patterns.

English

Steve Skojec@SteveSkojec·5 Mar

@shanaka86 Do people really like the way you write this stuff? "Read that again." "Sit with that." "Let that sink in." Why not write like a good writer instead of a LinkedIn broetry purveyor? You have the information. You have the chops. Don't cave.

English

859

73.2K

Shanaka Anslem Perera ⚡@shanaka86·5 Mar

BREAKING: Two Iranian jets flew at 80 feet above the Persian Gulf to avoid radar. They were two minutes from Al-Udeid Air Base when Qatar shot them down. Al-Udeid is the largest US air base in the Middle East. 10,000 American personnel. The command hub for the entire Operation Epic Fury campaign. Iran sent two Soviet-era Su-24 bombers, flying so low they were skimming the water, directly at it. A Qatari F-15 intercepted them. Downed both. Qatar’s first aerial combat engagement in its history. Sit with the sequence of events on March 2 alone. Qatar’s Emiri Air Force intercepts 7 Iranian ballistic missiles, 5 drones, and shoots down 2 manned Iranian aircraft in a single day. Then QatarEnergy shuts down all LNG production and declares Force Majeure on every contract. This is the same Qatar that spent the last decade positioning itself as the Gulf’s indispensable neutral, the country that hosts both US forces and Hamas political leadership, the mediator everyone calls when no one else will pick up the phone. Qatar’s neutrality died on March 2. And here is the strategic consequence nobody has fully priced. The Su-24 flew at 80 feet because that is below the radar floor. Iran developed that tactic specifically because it knows Gulf air defense systems have a low-altitude blind spot. The planes were not on a reconnaissance mission. You do not arm Su-24s, fly at 80 feet across open water, and aim directly at the world’s most important US air base on a reconnaissance mission two minutes from your target. This was not a probe. This was the attempt. Qatar stopped it. But Iran now knows exactly where the radar gap is, what the intercept time looks like, and how Qatari F-15s respond under pressure. The next attempt will account for all of that. open.substack.com/pub/shanakaans…

English

454

1.7K

10.5K

Steven Roosa@StevenRoosa·5 Mar

@shanaka86 AI slop.

Nederlands

Steven Roosa retweetledi

Apparent Order@apparentorder·5 Mar

@jeremy_wokka @vxunderground Fortunately, FreeBSD didn't decide anything yet. That would be disastrous. I think you mean MidnightBSD. x.com/midnightbsd/st…

MidnightBSD@midnightbsd

Until we have a better plan, we modified our license to exclude residents of California from using MidnightBSD for desktop use, effective January 1, 2027. This is due to legiscan.com/CA/text/AB1043…?

English

2.7K

Steven Roosa retweetledi

Jeremy Shepherd 🔻🇵🇸@jeremy_wokka·4 Mar

@vxunderground The FreeBSD decision was the most based one: update your license to say "This software may not be used in California" and call it a day

English

110

43.4K

Steven Roosa@StevenRoosa·2 Mar

@binarybits This contains no real restrictions. It's so ridiculous.

English

Timothy B. Lee@binarybits·2 Mar

I don't understand why OpenAI thinks quoting this language would convince people concerned about autonomous weapon uses. "You can't do it in any case where it would be illegal" is another way of saying "you can do it if it's legal."

English

5.3K

Steven Roosa@StevenRoosa·22 Şub

@rohanpaul_ai And if a person can't do the same, would we say the person lacks general intelligence? BTW, that's 99.9% of the population failing that test. A better prediction is that there is no ceiling on the AGI test. It's getting silly already.

English

Rohan Paul@rohanpaul_ai·21 Şub

Demis Hassabis’s “Einstein test” for defining AGI: Train a model on all human knowledge but cut it off at 1911, then see if it can independently discover general relativity (as Einstein did by 1915); if yes, it’s AGI.

English

659

812

11.8K

2.2M

Steven Roosa@StevenRoosa·17 Şub

@binarybits It's also going to transform the practice of law.

English

317

Timothy B. Lee@binarybits·17 Şub

Feeling whiplash as I toggle between the Ars comment section (where a lot of readers think AI is fake) to interviewing programmers who read my newsletter and are convinced AI is in the process of transforming their profession. I think the first group is going to be surprised.

English

197

11.1K

Steven Roosa@StevenRoosa·14 Şub

@newstart_2024 This is a level of goofiness that shouldn't even be possible

English

Camus@newstart_2024·13 Şub

Jimmy Carr echoing Peter Thiel with a line that lands hard: “Minus the screens from any room, we’re still living in the 1970s. Nothing’s happened in physics since 1972. String theory has not got us anywhere.” But the real shift? “Take the compute power of AI and point it at physics… everything else in science is stamp collecting. Physics is the real thing. That gave us every bit of technology we have. What happens when you point AI at that?” He sees two roads ahead: a world of plenty with 50× productivity and human flourishing… or something that goes “another way.” If AI finally cracks physics in the next decade — what single breakthrough do you hope comes first: limitless energy, gravity control, new materials… or something we haven’t even named yet?

English

527

503

6.7K

2.5M

Steven Roosa retweetledi

aditya@adxtyahq·12 Şub

NASA writes mission-critical flight software in C. And the rules are absolutely INSANE. > No recursion. Ever. > Every loop must have a provable upper bound. > No dynamic memory allocation after initialization. > Max ~60 lines per function. > Minimum 2 assertions per function. > Every return value must be checked. > Zero compiler warnings allowed. > Daily static analysis. Zero warnings there too. > No function pointers. > Restricted pointer dereferencing. This is how they write code at NASA / JPL for mission-critical systems.

English

799

1.5K

19.4K

1.8M

Keşfet

@Hesamation @KobeissiLetter @tunguz @aakashgupta @SteveSkojec @shanaka86 @elonmusk @BarackObama