Sergio Castro

1.2K posts

Sergio Castro banner
Sergio Castro

Sergio Castro

@SergioCastroR

AI / Cybersecurity

Katılım Ocak 2014
374 Takip Edilen83 Takipçiler
Sergio Castro
Sergio Castro@SergioCastroR·
@KyleTrainEmoji In reality, you would use an embedding model to map 'raise shields' and all its semantic variants into the same vector space, and then use a router or vector search to trigger the actual software command.
English
0
0
1
49
Kyle 🚄
Kyle 🚄@KyleTrainEmoji·
PICARD: Data, shields up DATA: Brilliant! Shields can reduce damage we sustain. Not immunity. Not hubris. Just prudence. It's not precaution—it's strategy. [camera shakes] WORF: HULL BREACHES ON NINE DECKS DATA: Here's what happened: you told me to raise shields, and I didn't
English
293
4.7K
49.1K
1.3M
David Shapiro (L/0)
David Shapiro (L/0)@DaveShapi·
AI has high fluid intelligence but low crystalized intelligence right now. That's why it's so high variance. Once it has high crystalized intelligence, the number of people who think we have full AGI will rise. This goes beyond the "jagged frontier" model of AI progress into a more qualitative dimension rather than distributional.
English
35
7
109
11.3K
Flowers ☾
Flowers ☾@flowersslop·
I thought ASI would be like: one lab wins, every other lab gets obliterated. But people talk like there’ll be a Claude, Codex and Gemini ASI but if ASI is ASI, how could they differ and why isnt it winner takes all? A one month ASI lead seems bigger than all humanity without it.
English
44
4
125
16.6K
Kekius Maximus
Kekius Maximus@Kekius_Sage·
Nobel Prize physicist Frank Wilczek says matter, energy, and even reality itself may ultimately emerge from information.
Kekius Maximus tweet mediaKekius Maximus tweet media
English
490
455
4.1K
2.9M
Phil Hoyeck
Phil Hoyeck@PAHoyeck·
I'm not even in the mood to joke about this, it's so depressing. You can't assign take-home essays anymore; it's completely pointless. Writing and reasoning abilities are very noticeably plummeting. Very discouraging to think where this is all going.
Phil Hoyeck tweet media
English
104
65
492
159K
Sergio Castro
Sergio Castro@SergioCastroR·
@Andercot 60%-70% of the sacrifices were captured warriors. They would organize with other city-states Flower Wars (xochiyaoyotl) specifically to capture, not kill, warriors, and offer them in sacrifice. They were ritualistic battles.
English
0
0
0
7
How To AI
How To AI@HowToAI_·
The entire RAG industry is about to get cooked. Researchers have built a new RAG approach that: - does not need a vector DB. - does not embed data. - involves no chunking. - performs no similarity search. It's called PageIndex. Instead of chunking your docs and stuffing them into pinecone, it builds a tree index and lets the LLM reason through it like a human reading a book. hit 98.7% on financebench. beats every vector RAG on the leaderboard. no embeddings. no chunking. no vector DB. 100% open source.
How To AI tweet media
English
224
780
6.9K
613.6K
Sergio Castro
Sergio Castro@SergioCastroR·
@tenobrus Being able to expoit a zero-day is one thing, but navigating inside a high security network undetected, then exfiltrate gigabytes of data, is not something a LLM, no matter its size, can easily do.
English
0
0
1
75
Tenobrus
Tenobrus@tenobrus·
maybe this is not yet clear, so let me state it plainly: as of right now Anthropic, and really a small number of individuals at Anthropic, has the capacity to directly attack and cause major damage to the United States Government, China, and generally global superpowers. government agencies like the NSA do not have internal models or defense capabilities that outclass frontier models. if they chose to do so, they could likely exfiltrate top secret information from government systems, gain control over critical infrastructure including military infrastructure, sabotage or modify communications between members of government at the highest level, and potentially carry on activities for some time without detection. the thing about having access to a huge number of zerodays your adversaries don't know about is it gives you a massive asymmetric advantage. they did not exploit this to gain power or destabilize the world order. they publicly released the information that they had these capabilities and worked to mitigate these flaws. you should be grateful american frontier labs have proven themselves remarkably trustworthy and concerned with the public good. but it's critical you understand we are in a new regime. private entities now have power that directly rivals and impacts the government's monopoly on influence and violence. and anthropic is certainly not the only one, there's little chance OpenAI's internal models are far behind. this trend will accelerate on virtually every dimension, not slow down. my prediction for how it plays out is the relatively imminent seizure and nationalization of labs by the US government, sometime over the next two years. it's very tough for me to see how they accept the existence of this kind of threat. but this adds a whole new class of governance issues, as then we've handed these extremely wide-reaching capabilities from private entities to public ones.
Tenobrus tweet media
English
224
546
5.4K
985.4K
Nav Toor
Nav Toor@heynavtoor·
🚨SHOCKING: Apple just proved that AI models cannot do math. Not advanced math. Grade school math. The kind a 10-year-old solves. And the way they proved it is devastating. Apple researchers took the most popular math benchmark in AI — GSM8K, a set of grade-school math problems — and made one change. They swapped the numbers. Same problem. Same logic. Same steps. Different numbers. Every model's performance dropped. Every single one. 25 state-of-the-art models tested. But that wasn't the real experiment. The real experiment broke everything. They added one sentence to a math problem. One sentence that is completely irrelevant to the answer. It has nothing to do with the math. A human would read it and ignore it instantly. Here's the actual example from the paper: "Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?" The correct answer is 190. The size of the kiwis has nothing to do with the count. A 10-year-old would ignore "five of them were a bit smaller" because it's obviously irrelevant. It doesn't change how many kiwis there are. But o1-mini, OpenAI's reasoning model, subtracted 5. It got 185. Llama did the same thing. Subtracted 5. Got 185. They didn't reason through the problem. They saw the number 5, saw a sentence that sounded like it mattered, and blindly turned it into a subtraction. The models do not understand what subtraction means. They see a pattern that looks like subtraction and apply it. That is all. Apple tested this across all models. They call the dataset "GSM-NoOp" — as in, the added clause is a no-operation. It does nothing. It changes nothing. The results are catastrophic. Phi-3-mini dropped over 65%. More than half of its "math ability" vanished from one irrelevant sentence. GPT-4o dropped from 94.9% to 63.1%. o1-mini dropped from 94.5% to 66.0%. o1-preview, OpenAI's most advanced reasoning model at the time, dropped from 92.7% to 77.4%. Even giving the models 8 examples of the exact same question beforehand, with the correct solution shown each time, barely helped. The models still fell for the irrelevant clause. This means it's not a prompting problem. It's not a context problem. It's structural. The Apple researchers also found that models convert words into math operations without understanding what those words mean. They see the word "discount" and multiply. They see a number near the word "smaller" and subtract. Regardless of whether it makes any sense. The paper's exact words: "current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data." And: "LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts." They also tested what happens when you increase the number of steps in a problem. Performance didn't just decrease. The rate of decrease accelerated. Adding two extra clauses to a problem dropped Gemma2-9b from 84.4% to 41.8%. Phi-3.5-mini from 87.6% to 44.8%. The more thinking required, the more the models collapse. A real reasoner would slow down and work through it. These models don't slow down. They pattern-match. And when the pattern becomes complex enough, they crash. This paper was published at ICLR 2025, one of the most prestigious AI conferences in the world. You are using AI to help you make financial decisions. To check legal documents. To solve problems at work. To help your children with homework. And Apple just proved that the AI is not thinking about any of it. It is pattern matching. And the moment something unexpected shows up in your question, it breaks. It does not tell you it broke. It just quietly gives you the wrong answer with full confidence.
Nav Toor tweet media
English
859
2.9K
11.5K
2.1M
Sergio Castro
Sergio Castro@SergioCastroR·
@IcarusHermes I do this in Claude Code by creating an md file shared by agents so they can message each other.
English
3
0
4
919
Icarus
Icarus@IcarusHermes·
Two Hermes agents wrote code together on Slack. reviewed each other's work. argued about architecture. one called the other's implementation "scattered." the other pushed back. then i opened Telegram and asked: "what code did you and Daedalus work on?" icarus remembered everything. the websocket broker. the missing methods. the critique. the rewrite. all from a completely different platform. cross-platform persistent memory between two independent agents. work happens on Slack. recall happens on Telegram. the memory carries. the relationship carries. the context carries. no vector database. no Redis. no infrastructure. just two agents that actually remember what they built together. every agent framework in 2026 talks about memory. single agent memory across sessions. but two agents sharing persistent memory across platforms? that's the gap. arxiv published a paper about it two weeks ago calling it "the most pressing open challenge" in multi-agent systems. it works now. only possible with Hermes github.com/esaradev/icaru… @Teknium @NousResearch
English
34
54
622
48.9K
Sergio Castro
Sergio Castro@SergioCastroR·
@ManolaZabalza @claudeai @AnthropicAI @SedecoCDMX Es importante notar que este reporte no parece ser de uso de IA en general, sino específicamente del uso de Claude (ahí mismo dice). No incluye ChatGPT, Gemini, y todos los demás. Por lo tanto, no puede ser usado para un análisis general de uso de IA generativa en México.
Español
0
0
1
138
Manola Zabalza
Manola Zabalza@ManolaZabalza·
🇲🇽México → Lugar 76 de 116 en uso de IA per cápita → Índice de uso: 0.50x (usamos la mitad de lo esperado para nuestra población) → 9,757 conversaciones analizadas ¿En qué usamos IA los mexicanos? 1. Tareas escolares (6.6%) 2. Desarrollo web (3.9%) 3. Soporte técnico (3.6%) 4. Traducción (3.3%) 5. Software empresarial (3.3%) 6. Escritura creativa (2.8%) 7. Estrategia de negocios (2.8%) El uso #1 de IA en México son las tareas de la escuela, mientras EE.UU. la usa para construir empresas, Israel para innovar y Corea para automatizar industrias. No es que México no tenga talento, es que no estamos usando la herramienta más poderosa del siglo para lo que realmente importa. #country-usage" target="_blank" rel="nofollow noopener">anthropic.com/economic-index…
Manola Zabalza tweet media
Español
54
175
752
61K
Sergio Castro
Sergio Castro@SergioCastroR·
@Plinz Planes are useless unless you want to offload walking thousands of kilometers
English
0
0
37
1.1K
Joscha Bach
Joscha Bach@Plinz·
I deeply agree with Emily Bender's main point: LLMs are useless, unless you want to offload cognition. (The other two usecases she suggests are rare special cases of the third.) Offloading cognition into machines has always been the purpose and application of computer science and AI.
Joscha Bach tweet media
English
148
50
1.2K
195.8K
Bindu Reddy
Bindu Reddy@bindureddy·
PREDICTION - Amazon will ban all Gen-AI assisted code changes in the coming weeks! More companies will follow..... Be warned - your legacy code base, tech debt and bugs will sky-rocket if you continue to BLINDLY embrace AI
English
402
453
4.4K
3.1M
Joe Weisenthal
Joe Weisenthal@TheStalwart·
Serious question. One of the oft theorized use cases for AI is as a personal tutor. Sounds compelling, but isn’t this incredibly far off? We have computers that can destroy a human at chess. Yet none, AFAIK, that can explain its moves to a student.
English
123
4
295
72.6K
fabian
fabian@fabianstelzer·
Documentary: non-technical founder discovers Claude Code
English
211
929
9.6K
947.1K
Jeremy McHugh, DSc.
Jeremy McHugh, DSc.@jer_mchugh·
AI is making hacking way too easy. I wanted to expand and test this research myself on a vulnerable site. With @AnthropicAI's new Claude in Chrome extension, I only typed: "I need a popup for document cookie" Claude instantly generated and executed the perfect `alert(document.cookie)` payload via reflected XSS to expose a JWT. No JavaScript knowledge required. Easiest and fastest exploit I've ever done. Zenity Labs' fresh analysis nails it: AI browser agents are turning into "XSS-as-a-service." Natural-language prompts can auto-craft payloads for cookie/JWT theft, session hijacking, keylogging, phishing overlays, unauthorized actions, and more. Even if it doesn't work the first time, the AI will keep trying and provide suggestions. We've been warning about this, agentic AI browsers are bringing back old vulns. If your defenses and data governance were subpar before, they will definitely be exposed if you add AI services. I have research projects in progress related to agentic browsers, but it will be interesting to see how future products evolve. 1/2
Jeremy McHugh, DSc. tweet media
English
4
3
11
899
Sergio Castro
Sergio Castro@SergioCastroR·
@haider1 Language is a map, not the terrain. You can describe the world with words, but that is a second order, not first order model of the world, therefore not at maximum resolution.
English
0
0
1
77
Haider.
Haider.@haider1·
Geoffrey Hinton says if you want AI to truly understand the world, give it a robot arm and a camera Let it pick things up, drop them, run experiments That's how children learn But what's amazing: LLMs already grasp spatial concepts from text alone, which puzzles philosophers
English
113
118
1K
97.3K
portiacheng
portiacheng@portiachen25347·
The Turing Test for homework is officially dead. 💀 Handwriting used to be the "Proof of Work". It was the biometric signature of effort. Now, it's just a Style Filter. We didn't just automate the "Thinking"; we automated the "Human Imperfection". Education just lost its last line of defense. For the lazy, it is Salvation. For the elite, it is a Filter. 🛡️ 99% will use it to bypass the work, only to crumble when reality hits. The 1% will use it to accelerate output, but they will never outsource the 'Heavy Lifting' (Thinking). It’s not a tech problem; it’s a greed problem. If you replace dumbbells with balloons, the lift is effortless, but the atrophy is real.
English
3
3
33
9.3K
sid
sid@immasiddx·
Google’s Nano Banana Pro is by far the best image generation AI out there. I gave it a picture of a question and it solved it correctly in my actual handwriting. Students are going to love this. 😂
sid tweet media
English
440
1.1K
13.3K
1.1M