Paris

14.2K posts

Paris banner
Paris

Paris

@blocktivist

overfitting models and breaking contracts

Europe เข้าร่วม Haziran 2022
4.6K กำลังติดตาม3.3K ผู้ติดตาม
ทวีตที่ปักหมุด
Paris
Paris@blocktivist·
A collection of Twitter accounts delivering the latest and most relevant insights on Web3 security. 🌐 What accounts should be added to this list? twitter.com/i/lists/174399…
English
4
1
18
4.4K
Paris
Paris@blocktivist·
@0xPrajwal_ I would, but please log in with your MS account first.
English
1
0
1
78
Prajwal
Prajwal@0xPrajwal_·
Give me one solid reason.... To choose a MacBook over a Windows laptop !
Prajwal tweet media
English
194
4
172
21K
Sahil
Sahil@sahill_og·
Claude Opus 4.7 spent 50 minutes refactoring my codebase. It finished confidently. Nothing compiled. Half the APIs died. Frontend entered another dimension. The app is completely broken.
English
23
6
300
34.9K
Paris
Paris@blocktivist·
@trikcode That’s what bad agentic engineering looks like. Not a model’s fault.
English
0
0
0
73
Wise
Wise@trikcode·
i asked claude opus 4.7 to refactor a large codebase. 68 minutes, millions of tokens burned - it finished nothing worked. app completely broken
Wise tweet media
English
495
202
6.3K
377.5K
Paris
Paris@blocktivist·
@ishanxtwt If you’re willing to spend 20 bucks on an LLM subscription, then you are obviously not building something serious and not coding on a regular basis. And if that’s the case, why are you not using Sonnet? You’re just playing around anyway.
English
0
0
1
267
Ishan
Ishan@ishanxtwt·
Anthropic doubled Claude Code’s five-hour rate limits for Pro, Max, Team, and seat-based Enterprise plans why does $20 plan still feel like a free plan then?
Ishan tweet media
English
118
14
483
46.1K
Paris
Paris@blocktivist·
It says everything about the X LLM bubble. Most Claude users are enterprise employees who use the built-in circle that tells you everything about Claude usage within the app or run /usage in terminal. These users also have more domain knowledge and are able to chunk their prompts, and prompt and manage context better than the average X poster who one-shoots entire projects.
English
0
0
0
487
Om Patel
Om Patel@om_patel5·
THIS GUY BUILT A PHYSICAL DEVICE THAT SITS ON YOUR DESK AND SHOWS YOUR CLAUDE CODE USAGE LIMITS IN REAL TIME it's called clawdmeter. it runs on a $32 waveshare ESP32 dev board with a 480x480 AMOLED display instead of checking the claude code UI or guessing how much usage you have left, you can now just glance at your desk at this point anthropic should just mail these to us for free but i also don't need MORE claude usage anxiety AND it's open source on github. this situation says EVERYTHING about the current state of the product the claude code accessory market is booming in real time
Om Patel tweet media
English
207
321
5.8K
597.3K
Paris
Paris@blocktivist·
@conorsen How big was the pdf?
English
0
0
1
24
Conor Sen
Conor Sen@conorsen·
I burned through all my tokens in a session on Claude Pro this morning in maybe 10 minutes trying to pull data out of one PDF — there’s just no way there’s enough compute to disrupt a meaningful number of jobs this year.
English
360
302
6.7K
517.3K
Nav Toor
Nav Toor@heynavtoor·
Google has a pirate enemy. He's one guy. His name is Raymond Hill. He built uBlock Origin. The world's best ad blocker. 63K stars. GPL-3.0. He literally refuses every dollar you try to send him. Then Google did the unthinkable. July 24, 2025. Manifest V2 disabled everywhere. The full uBlock Origin stopped working on Chrome. The world's biggest ad company nuked the world's biggest ad blocker on its own browser. They called it "security." Coincidence. Here's the wildest part: Raymond didn't fold. Latest release: March 11, 2026. Still alive on Firefox. Still alive on Edge. Still alive on Brave. Still GPL-3.0. Still refusing every dollar. One developer vs. the trillion-dollar ad empire. But DO NOT install it. We should all keep Google richer. 100% Open Source. (Link in the comments)
Nav Toor tweet media
English
525
3.4K
26.9K
4M
Paris
Paris@blocktivist·
@Mappletons Bundle them to a plugin on GitHub/Lab. Install the plugin. Set all plugins to false in your user’s settings.json. Enable needed plugins in your project’s settings.local.json.
English
0
0
1
200
Maggie Appleton
Maggie Appleton@Mappletons·
How is everyone managing their agent SKILL.md files? Is it just chaos? Global skills, repo-specific skills, keeping them in sync between machines, figuring out which ones you have installed, authoring new ones. What are we doing? Does anyone have a sane system?
English
232
13
686
105.8K
Paris
Paris@blocktivist·
As you already pointed out, Boris already explained why it would be suboptimal to train models on needle in a haystack retrieval unless you expect users to fire up a session and code, ask for pasta recipes and optimze travel schedule within that same session. Managing context efficiently became more important and the rewards when doing so are now greater.
English
0
0
0
195
elie
elie@eliebakouch·
this long context MRCR score is extremely scary for opus 4.7, the worst score by far among frontier models
elie tweet media
English
27
17
415
42.5K
Paris
Paris@blocktivist·
@MINHxDYNASTY Now that you’ve reflected on this, do you think you had a true edge at any point?
English
0
0
0
39
۟
۟@MINHxDYNASTY·
i never talk about this because it's quite embarrassing but i was up around $700k - $800k last year from sports betting started with $20k, just having fun as i took a break from trading memecoins and thought it was a good escape (what?) but it worked work generational run with the pacers, betting mostly spreads and moneylines of game 1s of every series every single one of those hit... and with size as we scaled up those that watched, remembered how insane some of those games and shots were it's how the "haliban" was born one day while i was playing real basketball with one of my best friends, i told him THIS was the time to quit and walk away. withdraw everything and not to look back. almost adding another milly to the bank. from something that started as just a side quest ive run up a few accounts for fun in the past and always gave it back, so now was the time to execute on the learnings. then it happened.. somehow, im still blinded by the memory, but i gave all of it back. a mixture of sizing up, betting heavy on things i didnt have true confidence in. that led to tilting, revenge trading, and then a spiral to 0 (in that wallet). i dont even remember the bets. i think i was doing like 50k bets on random games and i see the same thing in other categories too. trading, making money, whatever. never let it get to your head. stay calm. go slow to go fast. otherwise you can give it all back and more. and there's no reason to do that. now, i just do it for fun and good vibes. but we're back on another generational run, currently, 9-2 in the playoffs, all bets public. im going to take a few days off, but this has been a vibe. i think as long as i dont go private and only play when i post, i can stay locked in.
۟ tweet media
English
64
4
318
40K
Paris
Paris@blocktivist·
Insider trading makes prices more accurate, but without them prices wouldn’t be useless at all. A sufficiently liquid market that reflects all public information, offers financial incentives, and allows for arbitrage still results in somewhat accurate prices. Search “wisdom of the crowd”. Public stock market prices are sufficiently efficient and allow for much less insider trading as prediction markets.
English
1
0
0
142
Simon Dedic
Simon Dedic@sjdedic·
Funny how everyone rages against Polymarket insiders while completely ignoring the fact that prediction markets literally can’t function without them. Hate it or not, but insiders are what make prediction markets accurate. That’s the whole point. Remove them and you don’t get a fairer market, you get a useless one.
English
51
4
96
12.5K
Paris
Paris@blocktivist·
@peterrhague Most people would also think switching or staying in the Monty Hall problem is irrelevant. That does’t change what’s actually optimal. Embarrassing post, Peter.
English
0
0
0
34
Peter Hague
Peter Hague@peterrhague·
Amazing how lots of self appointed game theory experts confidently asserting that blue is the stupid choice. But every time this poll is run blue wins. Not only is the “game theory” answer predicting the wrong outcome, its explanatory power is based on it being able to predict the right answer. So it’s doubly wrong.
Tim Urban@waitbutwhy

Everyone in the world has to take a private vote by pressing a red or blue button. If more than 50% of people press the blue button, everyone survives. If less than 50% of people press the blue button, only people who pressed the red button survive. Which button would you press?

English
1.2K
191
4.3K
843K
Paris
Paris@blocktivist·
Do you really believe that adding this to your CLAUDE.md actually makes Claude better? If this answer is yes, do you really believe that Anthropic’s engineers would not have added it to the system prompts already? This is the exact type of bad context mgmt that lets X folks run out of usage limits, add a bunch of redundant prompts to memory, install 30 MCP servers and 50 skills, and then one-shoot the living shit out of a model. Watch it fail and cry on X about it.
English
0
0
0
170
Sumanth
Sumanth@Sumanth_077·
A single 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 file that makes Claude Code 10x more powerful! This repo distills Andrej Karpathy's observations on LLM coding pitfalls into four actionable principles. The problem Karpathy identified: LLMs make wrong assumptions without asking. They overcomplicate code - writing 1000 lines when 100 would do. They change unrelated comments and code they don't understand. They don't surface tradeoffs or push back when they should. This 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 file fixes it with four principles: Think Before Coding - Don't assume. State assumptions explicitly. Present multiple interpretations when ambiguity exists. Push back when a simpler approach exists. Stop when confused and ask for clarification. Simplicity First - Minimum code that solves the problem. No features beyond what was asked. No abstractions for single-use code. No speculative flexibility. If 200 lines could be 50, rewrite it. Surgical Changes - Touch only what you must. Don't improve adjacent code or refactor things that aren't broken. Match existing style. Every changed line should trace directly to the request. Goal-Driven Execution - Define success criteria and loop until verified. Transform "fix the bug" into "write a test that reproduces it, then make it pass." Strong success criteria let Claude loop independently. The key insight from Karpathy: LLMs are exceptionally good at looping until they meet specific goals. Don't tell it what to do - give it success criteria and watch it go. Install as a Claude Code plugin or add to your project as 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱. 55k+ GitHub stars. I've shared the link in the replies!
Sumanth tweet media
English
15
46
406
25.6K
Kun Chen
Kun Chen@kunchenguid·
hard to find something more disappointing than: - having written a perfect prompt of a big task and sent to agent - thinking "that's probably enough for an hour" - went to dinner - came back seeing this agent: "just one more question before I start..."
English
29
4
138
8.3K
Paris
Paris@blocktivist·
@EXM7777 Same. Ofc X is a bubble, most CC users don’t post on X, they just use it - see Anthropic’s ARR development.
English
0
0
0
12
Machina
Machina@EXM7777·
X is definitely a bubble man 90% of issues i see people complain about here, i've never experienced them once people shat on Opus 4.6 for weeks.. aside from servers going down more than they should, experience was awesome for me people talk shit about the new Claude Code app, i find it extremely smooth, haven't hit a single bug so far people complain about limits on the Claude plan, i only hit my 5hr limit when i'm building non-stop for 5 hours straight, which is kinda insane build your own opinion
English
71
9
284
15.1K
Paris
Paris@blocktivist·
@Adidotdev New jobs will evolve, as always when groundbreaking tech evolves. It’s not that hard to understand, there are plenty of examples in the past.
English
1
0
1
91
Adit_Yah ☄️
Adit_Yah ☄️@Adidotdev·
Everyone says AI will replace most jobs. But if there are no jobs, there’s no income. No income means no spending. So how does the economy even function? What am I missing?
English
462
51
503
40.9K
Paris
Paris@blocktivist·
@johnennis Nice prompting skills, John.
English
0
0
0
24
John Ennis
John Ennis@johnennis·
Honestly starting to hate Opus 4.7
John Ennis tweet media
English
137
13
505
56.7K
Paris
Paris@blocktivist·
@kannthu1 Would you have found them if you didn’t know what you were looking for?
English
0
0
0
314
Dawid Moczadło
Dawid Moczadło@kannthu1·
We replicated Mythos findings in opencode using public models, not Anthropic's private stack. The moat is moving from model access to validation: finding vulnerability signal is getting cheaper; turning it into trusted security A better way to read Anthropic's Mythos release is not "one lab has a magical model." It is: the economics of vulnerability discovery are changing. We took the patched public Mythos examples and tried to reproduce them with GPT-5.4 and Claude Opus 4.6 in an open-source harness. Every run stayed below $30 per file. AI models are already good enough to narrow the search space, surface real leads, and sometimes recover the full root cause in battle-tested code. The takeaway: model access is not the moat anymore. Validation is. Finding vulnerability signal is getting cheaper; turning it into trusted security work is still hard. Co-authors: @KlaKlo_, Amadeusz, Marek, Kuba, Mikolaj
Dawid Moczadło tweet media
Dawid Moczadło@kannthu1

UPDATE: We were able to replicate the Mythos findings using existing models (GPT5.4) Writeup coming early next week, no BS prompts, it's real reproduction

English
22
88
570
109.3K
Paris
Paris@blocktivist·
@trq212 Nice piece. Can we make the hook output size limit configurable? I should be able to populate 3% of a 1M context window at session start if I need to.
English
0
0
0
87
Thariq
Thariq@trq212·
I edited the intro because I realized I buried the lede originally- The 1M context window is a double-edged sword. It allows Claude to do more complex tasks but it can also leads to more context pollution if you don't manage your session well. This is how you do that:
Thariq@trq212

x.com/i/article/2044…

English
94
117
1.6K
272.5K
Paris
Paris@blocktivist·
@peter_szilagyi I have all git commands on deny, never trust an LLM with version control
English
0
0
0
913
Péter Szilágyi
Péter Szilágyi@peter_szilagyi·
Hey Claude, can you isolate this 3 liner fix into a new branch? Claude: Sure! `git stash && git checkout && git stash pop && git checkout --` And that is how I lost an hour of work.
English
26
1
340
48K