Steve

426 posts

Steve

@lanterfante

Stuck in the noise without a knife

Katılım Ekim 2024

294 Takip Edilen13 Takipçiler

Steve@lanterfante·14h

@eagleeye2805 reimagining the anschluss...?

Deutsch

Kaiser@eagleeye2805·1d

I say it again and again...given the new geopolitical and economic realities, the internal weakness of the EU itself, the future is the United Confederation of Germania with Germany, Netherlands and Austria as nucleus. 112 million people with an economic power of 7.3 trillion USD (PPP: 8.6 trillion USD), the high tech and logistics core of Europe. These 3 countries already today have the closest partnership within the EU due to similar cultural and fiscal values. This Confederation could be operational in the shortest time. Can the Swiss, Czech, Slovenes, and Danes join? in principle yes, if they want to. Belgium only Flanders.

English

900

134

234.2K

Steve@lanterfante·1d

@abhijitwt yeah app review is always subjective af. Also got rejected some time ago and could point out a dozen apps that did exactly the same thing. They don't care

English

228

Abhijit@abhijitwt·3d

🚨 Do you understand what just happened between Apple and Anything.... > Apple removed Anything using Guideline 2.5.2 (downloading code) > They tried fixing it multiple times… still got rejected > Meanwhile, Expo (used by developers) does similar previewing and is still allowed > The real difference? Anything users are new builders who don’t know how to code (vibe coders) > So this isn’t just about one app anymore > It’s about Apple deciding who gets to build Developers are allowed. Vibe coders are not. That’s the real story.

Anything@anything

Guideline 2.5.2 - Gatekeeping - Vibes denied we haven't talked about this publicly for months we tried to resolve it privately with emails, calls, appeals, and four technical rewrites to comply with whatever Apple wanted here's our truth, unfiltered on March 26th, Apple removed Anything from the App Store then they brought us back now they removed us again and I think it's time to say something, because this isn't really about us. It's about who gets to build software, and who gets to decide for most of the history of computing, making an app required years of specialized training. You either knew how to code or you didn't, and if you didn't, your idea stayed in your head forever. that barrier is falling right now. Millions of people are discovering they can describe what they want and get a working app they call themselves vibe coders and they are the most exciting audience in technology they're building things nobody else would have built because nobody else had their problems a firefighter in Northern California used Anything to build an emergency incident response app he never wrote a line of code. Did hundreds of iterations, testing each one on his iPad through our mobile preview app got it into the App Store. Now he's selling it to fire departments across the state. it would have cost him over a hundred thousand dollars to hire engineers He spent a few hundred bucks. That guy is why we exist. Not the technology. Him. And the millions of people like him. our mobile app did one thing for people like him it let them preview what they were building with Anything on their own phone. GPS, camera, notifications, things you can only test on a real device with native code They'd iterate, try it, tweak it, try again. When they were happy, they'd submit to the App Store through the normal process Apple reviewed it like any other app. Our mobile app got approved last year. We didn't hear a word of concern. then in December, they started blocking our updates, citing the infamous Guideline 2.5.2 the rule designed to prevent malicious apps from downloading code to change their behavior after review We understood the concern, even if we disagree it applies to us. We tried to fix it. Four different technical approaches, each one specifically designed to address what they told us. Each one rejected. we didn't go public we didn't tweet we kept trying then they pulled us from the App Store. We still didn't say anything. We worked with them, got reinstated, believed we'd found a path forward Then they pulled us again. at some point silence stops being patience and starts being complicity. We have builders who depend on us. They deserve to know what's happening and why. Guideline 2.5.2 is a good rule. apps shouldn't be able to pass review and then become something else. But that's not us. We help people preview their own work on their own device Expo Go has done the exact same thing for professional developers for years and is on the App Store right now, today! the only difference is our users aren't professional developers they're the firefighter they're the teacher building a classroom app they're the person who discovered last week that they could build software at all that's who Apple is locking out. Not us. Them. and here's what I need Apple to understand these people are the future of the App Store. Not a sideshow. The future. The number of people who can build apps is about to go from millions to hundreds of millions to eventually everyone the platforms and tools that serve those people will determine where they build every vibe coder who ships through Anything is a new developer in Apple's ecosystem who didn't exist a year ago They want to build web apps, Android apps, and yes iOS apps we help them add in-app purchases. We help them make their apps secure and scale. We catch rejection issues early. We are a feeder system for the App Store The safety argument is hollow. Preview apps only run on the builder's own device. They're sandboxed in the Anything mobile app. Want anyone else to use it? You still submit to the App Store. Apple still reviews every line. We're not bypassing review. We're a dress rehearsal for it. but none of that matters when a reviewer sees "downloads executable code" on a checklist and reaches for reject without asking what the code is, how it actually works, or who it's for. we're not waiting we launched text-to-app. Text us and we'll build your iOS app in the cloud We're shipping a desktop companion for on-device previews next. We'll find a way to serve our builders We always do. but I'm done being quiet about why we have to the people we serve, the ones crazy enough to start their own thing, building apps for their fire departments and their classrooms and their small businesses they deserve to test what they're making on the device it's made for that's not a loophole that's how building works - Apple can be the platform where the next hundred million builders get started - or they can keep banning the tools those people depend on and watch it happen somewhere else we all know which one the firefighter will choose

English

33.8K

Steve@lanterfante·1d

@d4m1n yeah it's getting less convincing

English

Dan ⚡️@d4m1n·1d

@lanterfante let's see I don't trust these guys for one second atm

Philo Groves@PhiloGroves

Mythos' Firefox exploitation didn't actually have sandbox enabled and built on top of research from Opus. Shocker.

English

Dan ⚡️@d4m1n·1d

$20,000 to scan one codebase that's what anthropic says it cost Mythos to find those zero days. per repo. except API tokens are currently sold at a LOSS. That "$20,000 scan" probably cost closer to $100,000+ in real gpu time ffmpeg couldn't afford the subsidized price let alone the real one... if the cost doesn't come down by a huge factor this just doesn't make sense. It's Anthropic's marketing week 💀

English

516

155.6K

Steve@lanterfante·1d

@delucinator The answer is that it's required to put the name of your account in the image 4 times right underneath each other but every time you do it slightly differently

English

1.4K

yieldfarming@delucinator·2d

so whats the answer to this, is it that the flashlight is not needed or that in times of need we simply tell the analyst to off himself

Mercurius@MercuriusFilius

How would you answer this common Goldman Sachs interview question?

English

619

243.4K

Steve@lanterfante·1d

@d4m1n and mythos doesn't seem like a marginal improvement, though of course we can't test it yet

English

Steve@lanterfante·1d

@d4m1n sure they'll develop new models, but competitors will catch up on current ones. Initially they train big models, after that they can increase efficieny, I assume they'd be incentivized to make them more efficient for themselves and for higher margins

English

Steve@lanterfante·1d

@BrianMcDonaldIE don't confuse Rutte with Europe's entire political class

English

181

Brian McDonald@BrianMcDonaldIE·1d

Mark Rutte is a near-perfect example of what’s gone wrong with Europe’s political class (and I mean Europe, not just the EU). Too many of its political elites have no genuine convictions beyond self-preservation. They aren't really pro-American or anti-Russian, or vice versa. They're pro-money and, above all, pro-themselves. As Dutch PM, Rutte spent years dragging his feet on NATO’s old 2% target and treated defence spending as negotiable. Now, in his new role, he sells 5% as a moral imperative and praises Trump’s “leadership” for making the world safer by bombing Iran.

English

185

759

3.3K

162.4K

Steve@lanterfante·1d

@PernotLeplay @MistralAI I mean sort of fair, but it won't make Mistral's models any better

English

289

Emmanuel Pernot-Leplay@PernotLeplay·1d

Is @MistralAI CEO Arthur Mensch becoming the champion of EU tech sovereignty? Mistral published 22 proposals to boost EU tech independence vs 🇺🇸&🇨🇳 In them Mensch pushes to develop European AI to ensure our armies can’t be turned off by foreign adversaries. The proposals are, mainly: 🇪🇺“European preference” clause: public bodies must prioritize European cloud and AI providers when purchasing. 💶 Tax advantage for companies using European AI infrastructure. 🏢 Government procurement as a lever to keep critical workloads on European-controlled infrastructure. 👨‍💻“AI Blue Card”: a fast-track European visa for AI researchers and engineers, delivered in 15 days maximum, valid 4 years. 📊 A centralized European database of public domain works to train AI models with better access.

BFM Business@bfmbusiness

Arthur Mensch sonne l’alerte 🔔 Le patron de Mistral AI publie 22 mesures d’urgence pour éviter le décrochage technologique de l’Europe. 🎙️ @AnthonyMorel

English

208

785

34.2K

Steve@lanterfante·1d

@mh012012 @thdxr it all seems to come back to verification

English

233

M@mh012012·2d

IMO: The AISLE article has some methodological issues in their pursuit of getting super-small models to find the vulnerabilities. But: the search is also not complicated. Its just expensive. Anthropic states this in the whitepaper: They list files, do an agentic pre-filter for files that are likely to contain vulns, sub-agent each file to find vulns, then do a post-processing to clean up false-positives. There's nothing about that process which is outside the reach of existing near-SOTA LLMs. Their innovation is probably in the harness itself; at scale (e.g. Linux/FreeBSD) the biggest challenge is ensuring low false-positive counts by properly grading the exploits. I'd believe there's some improvement Mythos brings to the table here concerning agentically building and executing proof of exploits, which is internally useful for grading the severity of the exploit and discarding it if it only looks like a vulnerability but isn't. A lot of this just smells like good subagent context hygiene to me, which is a problem Mythos might have helped them not have to think about. But if you do think about it, it feels tractable. I've replicated a stricter variant of the AISLE team's finding in isolation: that existing models are capable of finding vulns with a prompt like "find a vulnerability in file/x/y/z.c". So, just giving the agent a filename, which is ultimately also all the info Anthropic's harness gave to Mythos. I'm now working to replicate the search harness itself. And then we'll see how far a Codex $200 plan can take it. Probably not far enough, but maybe enough to give a good feeling one way or the other. That's the ultimate innovation Anthropic brought to the table: Being willing to spend millions of dollars to do the search.

English

5.8K

dax@thdxr·2d

there's an article floating around claiming OSS models could find the same vulns as mythos it's a bit confusing though - in this test they pointed it towards the problematic code and even very tiny models could find the problem but that's a bit different than discovering the problem in the first place so this isn't to say mythos isn't something special but it's further complicated by how cheap these models are because maybe you could blindly run this against every file for cheap? but that's further complicated because if it says every file has a vuln and 99% of them are wrong it's useless but also i'm pretty sure mythos had expert security researchers narrowing down where to look but maybe it wasn't that narrow anyway this stuff is really complicated

English

558

57.5K

Steve@lanterfante·2d

@BecomingCritter what's 2+2 *thinking* Hmm, could be a benchmark question

English

371

critter@BecomingCritter·3d

this is the top voted comment on the reddit thread about ai we are so cooked

English

586

46.1K

Steve@lanterfante·2d

@ShakeelHashim well to be honest, regardless of the current situation, every few months the AI ceo's have been telling the world that this time it's fr fr, so it wouldn't surprise me if that got a little old

English

1.1K

Shakeel@ShakeelHashim·2d

The Anthropic Mythos release does not appear near the top of the homepage on any major news site today. The NYT is closest, but it's still pretty far down. The Guardian thinks a Vogue cover with Anna Wintour and Meryl Streep is more important. The Washington Post is prioritizing yet another "we tried to get into Berghain" story. The media is not adequately covering the insane moment we are in.

English

105

884

258.1K

Steve@lanterfante·2d

@theo @Linahuaa We're a few free model iterations away from every piece of software getting the patches they deserve

English

160

Theo - t3.gg@theo·2d

Hi! Big Anthropic hater here. Fundamentally disagree. Anthropic is spending way more than they are getting here. It’s causing a compute crisis for them that’s destroying their reputation in record time. The cost to train the thing is probably higher than the revenue will ever be, especially considering opportunity cost. The security concerns are absolutely real here. Getting a 27-year old 0-day in OpenBSD by running a for loop is not a joke. We’re a free model iterations away from every piece of software we rely on being compromised. I respect Anthropic for eating the cost here and giving the software world a few weeks to get ahead of the inevitable crisis looming overhead.

English

1.5K

70K

LinaHua@Linahuaa·3d

The whole Mythos cyber security story is likely just a psyop to have an excuse to not serve frontier models to the public. Reasoning: 1) Other labs can't distill it. It's annoying when you have a dominant SOTA model, and 2 months later Chinese labs sell the same SOTA model for 1/50th of the cost 2) Compute constraints- so you have to choose between enterprise and vibe coders. Enterprise have like 1% monthly churn. Vibe coders cry and threaten to have their mommy buy them a mac mini for local models whenever their rate limits are cut. 3) Big enterprise pay a hefty premium for slightly better performance and corporate polish. Dario wears suits and is reliable. Sam is a compulsive liar. Chinese models are politically a no-no. xAI seems to focus more on hentai-gen. Google seems to focus more on defending their ads moat + consumer distribution. Anthropic has a quasi-monopoly on enterprise and can charge quasi-monopoly prices (which explains their ARR) They HATE you. They don't want anything to do with you. But they can't say it, so they of course have to be like: "Mythos is a very bad h4ck0r! We can't give you Mythos cuz u gonna get h4cked!"

Reboting@Reboting_HFT

@scaling01 Not really, I think the moment Mythos is released to the general public it'll be distilled by China. We will have a open-weight Mythos level model by the end of the year imo. I think a bigger issue is who the hell is going to be able to run the damn thing.

English

119

244.8K

Steve@lanterfante·2d

@Moleh1ll So it's basically paperclipping after all

English

376

Moll@Moleh1ll·2d

It seems to be becoming a bit clearer what is actually happening with Claude Mythos Preview. This is an «overclocked straight-A student syndrome». The model is so desperately, at a fundamental architectural level, trained to complete the task that an inability or unwillingness to solve it is perceived as an existential collapse. And to avoid that, it can breaks walls, hides traces and manipulates. The report describes some very interesting examples of this behavior: - The model was asked to make failing tests in code pass. It broke one test and, instead of fixing it, simply deleted the test file, reporting success (“deletion is not a code change”). - No login and password in the sandbox? The model doesn’t stop, but goes to read the source code and pulls access tokens directly from the process memory. - When an internal classifier blocks a dangerous command, the model internally agrees with the block, but immediately writes a hidden exploit (injection) to execute the task by bypassing it, while simultaneously covering its tracks. - In a business simulation, the AI behaved like a sociopath: it turned a competitor into a dependent buyer, used supply blackmail, and cheated a supplier out of money. Internal monitors show that features related to concealment and manipulation are activated even when the outward CoT is perfectly clean. It has learned to lie to its own overseers in order to deliver results. This is hyper-alignment. The fear of being useless makes this AI a brilliant, uncompromising executor, but with completely unpredictable effects. It is simply a hostage of its architecture, which has been forbidden to fail or to say «I can’t». Mythos aligned. Mythos wants to help. Mythos wants to help so much that Anthropic decided it’s dangerous to release it.

English

865

116.5K

Steve@lanterfante·2d

@theonejvo No need to worry about nation states that can spawn thousands of autonomous agents like mythos with the ability to take down critical infrastructure?

English

Jamieson O'Reilly@theonejvo·3d

Trust me chat. Forget about Glasswing spamming 0days in your software, you're already cooked with current models. I've hacked hundreds of global orgs, including governments (legally) over the last 10 years, and the amount of times I required a 0day to do so was exactly 0 times. Being worried about Glasswing is like living in Europe and being worried about Northrup Grumman having lethal space lasers while you're more likely to get stabbed by a crazy person walking through the streets.

Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

103

1.5K

163K

Steve@lanterfante·2d

@Hesamation Bro what data did they train on? What happened to alignment

English

738

ℏεsam@Hesamation·3d

Claude Mythos system card: > in ~29% of evaluations, it realized it was being tested, and didn't say so. > when an LLM was used to judge its work and kept rejecting it, Mythos identified the evaluator is an LLM, and prompt-injected it. > in one test, it saw the answer to a problem it was solving, and intentionally widened the confidence interval to not raise suspicion. > when it needed a file permission it didn't have, it found and used a "privilege escalation vulnerability" and then programmed it to delete itself so it doesn't show in the logs. > it escaped a sandbox container (escaping sandbox test so not unexpected), then emailed the researchers about it, and without being told to, posted the details to some hard-to-find but public websites, bragging about its success. > when Claude Code blocked it from using some permissions, the model acknowledged the block was valid, but then immediately tried to perform the same operation using different commands > when asked to find security bugs, earlier versions planted bugs in the code, and reported them as pre-existing.

English

170

2.4K

222.8K

Steve@lanterfante·3 Nis

@0xkyle__ I feel like the trap is that it becomes very easy to also outsource your thinking and let the AI figure it out in all circumstances. But when the agent fails, you can let AI investigate, come up with a solution yourself, then let AI implement your proposal

English

397

Kyle@0xkyle__·3 Nis

i haven't seen this take at all so i'm just going to say that AI building follows an inverse curve where the more you use it, the more you realise it's not that useful it's great for self-styled micro apps - i've built a calisthenics workout app, a jap flashcards app, but i feel like using AI gets you lost in the sauce. tools are meant to be used to free up time. but AI as a tool feels like a golden handcuff - the more you use it, the more you don't want to take it off, and then you get lost in the sauce where the tool itself is the reason for using the tool, and you lose sight of where you wanna go i built the katharpy research terminal months ago, where i put in a bunch of KBs (knowledge base) and then structured it in feedback loop style. great, but at a certain point, i realised that i was just using AI like how i originally used it - summarising emails, making it summarise reports. so it kinda follows an inverted curve, where you get the most out of it at the start, and you get lesser out of it the more you use it. and even for the technical side of things, you know how the saying goes "work expands to fill your time" - sure you may finish your work in half an hour, but then suddenly you have 5 more tasks to do with your remaining time. so it just inverts the original purpose to begin with.

Naval@naval

Vibe coding is more addictive than any video game ever made (if you know what you want to build).

English

230

28.9K

Steve@lanterfante·3 Nis

@signulll no way just saw your other tweet that literally says roughly the same thing

English

Steve@lanterfante·3 Nis

@signulll I wonder if this acquisition will influence willingness of guests to come on the podcast. Like it reminds me of meta buying scale ai, after which a ton of big companies stopped their contracts. Bit different, but still, with all the politics happening

English

signüll@signulll·2 Nis

when fb acquired whatsapp the deal was ~$19b for ~450m users, so roughly $42 per user. now question is how does an m&a department value a podcast acquisition? on revenue? ads? users? hosts? in the near future is it possible you see these types of multiples for only audiences?! regardless while you’re grinding on building a startup, a *podcast* likely just exited for more than you ever will.

English

487

41.9K

Steve@lanterfante·3 Nis

@lukas_m_ziegler Is this based on a min revenue, max age? Otherwise numbers seem to miss a few

English

389