Sergio Castro

1.2K posts

Sergio Castro

@SergioCastroR

AI / Cybersecurity

Katılım Ocak 2014

374 Takip Edilen83 Takipçiler

Sergio Castro@SergioCastroR·1d

@KyleTrainEmoji In reality, you would use an embedding model to map 'raise shields' and all its semantic variants into the same vector space, and then use a router or vector search to trigger the actual software command.

English

Kyle 🚄@KyleTrainEmoji·1d

PICARD: Data, shields up DATA: Brilliant! Shields can reduce damage we sustain. Not immunity. Not hubris. Just prudence. It's not precaution—it's strategy. [camera shakes] WORF: HULL BREACHES ON NINE DECKS DATA: Here's what happened: you told me to raise shields, and I didn't

English

293

4.7K

49.1K

1.3M

Sergio Castro@SergioCastroR·1d

@DaveShapi The harness provides the crystallized intelligence

English

David Shapiro (L/0)@DaveShapi·1d

AI has high fluid intelligence but low crystalized intelligence right now. That's why it's so high variance. Once it has high crystalized intelligence, the number of people who think we have full AGI will rise. This goes beyond the "jagged frontier" model of AI progress into a more qualitative dimension rather than distributional.

English

109

11.3K

Sergio Castro@SergioCastroR·14 May

@flowersslop Access to energy, minerals.

English

Flowers ☾@flowersslop·14 May

I thought ASI would be like: one lab wins, every other lab gets obliterated. But people talk like there’ll be a Claude, Codex and Gemini ASI but if ASI is ASI, how could they differ and why isnt it winner takes all? A one month ASI lead seems bigger than all humanity without it.

English

125

16.6K

Sergio Castro@SergioCastroR·12 May

@Kekius_Sage I think that's confusing the map with the terrain.

English

Kekius Maximus@Kekius_Sage·10 May

Nobel Prize physicist Frank Wilczek says matter, energy, and even reality itself may ultimately emerge from information.

English

490

455

4.1K

2.9M

Sergio Castro@SergioCastroR·8 May

@PAHoyeck Oral exams.

English

Phil Hoyeck@PAHoyeck·7 May

I'm not even in the mood to joke about this, it's so depressing. You can't assign take-home essays anymore; it's completely pointless. Writing and reasoning abilities are very noticeably plummeting. Very discouraging to think where this is all going.

English

104

492

159K

Sergio Castro@SergioCastroR·7 May

@Andercot 60%-70% of the sacrifices were captured warriors. They would organize with other city-states Flower Wars (xochiyaoyotl) specifically to capture, not kill, warriors, and offer them in sacrifice. They were ritualistic battles.

English

Andrew Côté@Andercot·6 May

"Yeah you know that city sacrificing 3000 - 4000 slaves every year in cannibalistic rituals? They totally would've developed a modern paradise if it wasn't for those European colonizers"

Alex Patrascu@maxescu

A vision of a modern Tenochtitlan in 2026, in a timeline where the Aztec Empire repelled Spanish conquest and modernized on its own terms. Enjoy:

English

163

68.8K

Sergio Castro@SergioCastroR·5 May

@HowToAI_ medium.com/%40aldendorosa…

QME

How To AI@HowToAI_·5 May

The entire RAG industry is about to get cooked. Researchers have built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. It's called PageIndex. Instead of chunking your docs and stuffing them into pinecone, it builds a tree index and lets the LLM reason through it like a human reading a book. hit 98.7% on financebench. beats every vector RAG on the leaderboard. no embeddings. no chunking. no vector DB. 100% open source.

English

224

780

6.9K

613.6K

Sergio Castro@SergioCastroR·9 Nis

@tenobrus Being able to expoit a zero-day is one thing, but navigating inside a high security network undetected, then exfiltrate gigabytes of data, is not something a LLM, no matter its size, can easily do.

English

Tenobrus@tenobrus·7 Nis

maybe this is not yet clear, so let me state it plainly: as of right now Anthropic, and really a small number of individuals at Anthropic, has the capacity to directly attack and cause major damage to the United States Government, China, and generally global superpowers. government agencies like the NSA do not have internal models or defense capabilities that outclass frontier models. if they chose to do so, they could likely exfiltrate top secret information from government systems, gain control over critical infrastructure including military infrastructure, sabotage or modify communications between members of government at the highest level, and potentially carry on activities for some time without detection. the thing about having access to a huge number of zerodays your adversaries don't know about is it gives you a massive asymmetric advantage. they did not exploit this to gain power or destabilize the world order. they publicly released the information that they had these capabilities and worked to mitigate these flaws. you should be grateful american frontier labs have proven themselves remarkably trustworthy and concerned with the public good. but it's critical you understand we are in a new regime. private entities now have power that directly rivals and impacts the government's monopoly on influence and violence. and anthropic is certainly not the only one, there's little chance OpenAI's internal models are far behind. this trend will accelerate on virtually every dimension, not slow down. my prediction for how it plays out is the relatively imminent seizure and nationalization of labs by the US government, sometime over the next two years. it's very tough for me to see how they accept the existence of this kind of threat. but this adds a whole new class of governance issues, as then we've handed these extremely wide-reaching capabilities from private entities to public ones.

English

224

546

5.4K

985.4K

Sergio Castro@SergioCastroR·7 Nis

@heynavtoor Outdated models

English

Nav Toor@heynavtoor·6 Nis

🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.

English

859

2.9K

11.5K

2.1M

Sergio Castro@SergioCastroR·27 Mar

@IcarusHermes I do this in Claude Code by creating an md file shared by agents so they can message each other.

English

919

Icarus@IcarusHermes·26 Mar

Two Hermes agents wrote code together on Slack. reviewed each other's work. argued about architecture. one called the other's implementation "scattered." the other pushed back. then i opened Telegram and asked: "what code did you and Daedalus work on?" icarus remembered everything. the websocket broker. the missing methods. the critique. the rewrite. all from a completely different platform. cross-platform persistent memory between two independent agents. work happens on Slack. recall happens on Telegram. the memory carries. the relationship carries. the context carries. no vector database. no Redis. no infrastructure. just two agents that actually remember what they built together. every agent framework in 2026 talks about memory. single agent memory across sessions. but two agents sharing persistent memory across platforms? that's the gap. arxiv published a paper about it two weeks ago calling it "the most pressing open challenge" in multi-agent systems. it works now. only possible with Hermes github.com/esaradev/icaru… @Teknium @NousResearch

English

622

48.9K

Sergio Castro@SergioCastroR·25 Mar

@ManolaZabalza @claudeai @AnthropicAI @SedecoCDMX Es importante notar que este reporte no parece ser de uso de IA en general, sino específicamente del uso de Claude (ahí mismo dice). No incluye ChatGPT, Gemini, y todos los demás. Por lo tanto, no puede ser usado para un análisis general de uso de IA generativa en México.

Español

138

Manola Zabalza@ManolaZabalza·25 Mar

🇲🇽México → Lugar 76 de 116 en uso de IA per cápita → Índice de uso: 0.50x (usamos la mitad de lo esperado para nuestra población) → 9,757 conversaciones analizadas ¿En qué usamos IA los mexicanos? 1. Tareas escolares (6.6%) 2. Desarrollo web (3.9%) 3. Soporte técnico (3.6%) 4. Traducción (3.3%) 5. Software empresarial (3.3%) 6. Escritura creativa (2.8%) 7. Estrategia de negocios (2.8%) El uso #1 de IA en México son las tareas de la escuela, mientras EE.UU. la usa para construir empresas, Israel para innovar y Corea para automatizar industrias. No es que México no tenga talento, es que no estamos usando la herramienta más poderosa del siglo para lo que realmente importa. #country-usage" target="_blank" rel="nofollow noopener">anthropic.com/economic-index…

Español

175

752

61K

Sergio Castro@SergioCastroR·15 Mar

@Plinz Planes are useless unless you want to offload walking thousands of kilometers

English

1.1K

Joscha Bach@Plinz·15 Mar

I deeply agree with Emily Bender's main point: LLMs are useless, unless you want to offload cognition. (The other two usecases she suggests are rare special cases of the third.) Offloading cognition into machines has always been the purpose and application of computer science and AI.

English

148

1.2K

195.8K

Sergio Castro@SergioCastroR·12 Mar

@bindureddy Just don't do it BLINDLY

English

Bindu Reddy@bindureddy·12 Mar

PREDICTION - Amazon will ban all Gen-AI assisted code changes in the coming weeks! More companies will follow..... Be warned - your legacy code base, tech debt and bugs will sky-rocket if you continue to BLINDLY embrace AI

English

402

453

4.4K

3.1M

Sergio Castro@SergioCastroR·1 Mar

@thekitze /rube_goldberg

Deutsch

kitze · supermac.io 🐦‍🔥@thekitze·28 Şub

me and claude code all day every day

English

410

1.9K

21.4K

4.5M

Sergio Castro@SergioCastroR·23 Şub

@TheStalwart Theorized? I've implemented tons of AI tutors.

English

Joe Weisenthal@TheStalwart·22 Şub

Serious question. One of the oft theorized use cases for AI is as a personal tutor. Sounds compelling, but isn’t this incredibly far off? We have computers that can destroy a human at chess. Yet none, AFAIK, that can explain its moves to a student.

English

123

295

72.6K

Sergio Castro@SergioCastroR·8 Şub

@fabianstelzer They don't know what they don't know.

English

282

fabian@fabianstelzer·6 Şub

Documentary: non-technical founder discovers Claude Code

English

211

929

9.6K

947.1K

Sergio Castro@SergioCastroR·31 Ara

@jer_mchugh @AnthropicAI It will be counterbalanced by agentic pentesting

English

Jeremy McHugh, DSc.@jer_mchugh·31 Ara

AI is making hacking way too easy. I wanted to expand and test this research myself on a vulnerable site. With @AnthropicAI's new Claude in Chrome extension, I only typed: "I need a popup for document cookie" Claude instantly generated and executed the perfect `alert(document.cookie)` payload via reflected XSS to expose a JWT. No JavaScript knowledge required. Easiest and fastest exploit I've ever done. Zenity Labs' fresh analysis nails it: AI browser agents are turning into "XSS-as-a-service." Natural-language prompts can auto-craft payloads for cookie/JWT theft, session hijacking, keylogging, phishing overlays, unauthorized actions, and more. Even if it doesn't work the first time, the AI will keep trying and provide suggestions. We've been warning about this, agentic AI browsers are bringing back old vulns. If your defenses and data governance were subpar before, they will definitely be exposed if you add AI services. I have research projects in progress related to agentic browsers, but it will be interesting to see how future products evolve. 1/2

English

899

Sergio Castro@SergioCastroR·4 Ara

@haider1 Language is a map, not the terrain. You can describe the world with words, but that is a second order, not first order model of the world, therefore not at maximum resolution.

English

Haider.@haider1·4 Ara

Geoffrey Hinton says if you want AI to truly understand the world, give it a robot arm and a camera Let it pick things up, drop them, run experiments That's how children learn But what's amazing: LLMs already grasp spatial concepts from text alone, which puzzles philosophers

English

113

118

97.3K

Sergio Castro@SergioCastroR·25 Kas

@portiachen25347 @immasiddx Oral exams.

English

portiacheng@portiachen25347·22 Kas

The Turing Test for homework is officially dead. 💀 Handwriting used to be the "Proof of Work". It was the biometric signature of effort. Now, it's just a Style Filter. We didn't just automate the "Thinking"; we automated the "Human Imperfection". Education just lost its last line of defense. For the lazy, it is Salvation. For the elite, it is a Filter. 🛡️ 99% will use it to bypass the work, only to crumble when reality hits. The 1% will use it to accelerate output, but they will never outsource the 'Heavy Lifting' (Thinking). It’s not a tech problem; it’s a greed problem. If you replace dumbbells with balloons, the lift is effortless, but the atrophy is real.

English

9.3K

sid@immasiddx·21 Kas

Google’s Nano Banana Pro is by far the best image generation AI out there. I gave it a picture of a question and it solved it correctly in my actual handwriting. Students are going to love this. 😂

English

440

1.1K

13.3K

1.1M

Sergio Castro@SergioCastroR·22 Kas

@Grady_Booch There will be no major studios

English

Keşfet

@KyleTrainEmoji @DaveShapi @flowersslop @Kekius_Sage @PAHoyeck @Andercot @HowToAI_ @tenobrus