cole murray

7.5K posts

cole murray

@_colemurray

ai/ml | cto | second time founder | former sr. sde @ amazon

San Francisco, CA เข้าร่วม Şubat 2015

961 กำลังติดตาม3.7K ผู้ติดตาม

ทวีตที่ปักหมุด

cole murray@_colemurray·25 Eki

Advice given to someone asking about AI Consulting: I don't think an ML background is required to be successful in AI consulting, but obviously helps. I think the biggest "skill" learned in ML is how to successfully do feedback loops in a system. In an ML system, this typically involves cleaning data, making model tweaks, performance evals etc. In LLMs, in nearly every case you won't be fine-tuning the model, but iterating on prompts is a very similar workflow. I do think it would be helpful to at least get a high-level learning of how the models "actually" work and become familiar with the basic terms. e.g. tokens, transformers, attention, what happens on each input -> output iteration as the model is predicting. You don't need to know the underlying math (helpful though), but having the understanding of what is happening is helpful. Most of the AI consulting market is more on full-stack / product development skills and less ML. This isn't the most lucrative opportunities, but they are available in abundance. Major areas now and over the next year: - RAG: this is basically just glorified search lol. Useful in many contexts but severely overhyped - Agents: The models aren't quite there yet IMO for this to be useful, but in 2025 I think this will be a major theme and a HUGE area of interest/investment. Becoming good at this will be valuable. - Evals: Performance evaluations are a relatively untapped market. Most AI products you see today are flying by the seat of their pants. Without eval metrics, you can't truly know if your prompt changes are improving the system. This is somewhat more difficult to sell as a consultant as it requires a more sophisticated buyer, but is worth a lot of money if you can do it well

English

226

42.6K

cole murray@_colemurray·36m

@antoniogm how do you think they’re paying for the compute😈

English

100

Antonio García Martínez (agm.eth)@antoniogm·3h

Shouldn’t we be doing this over the many complex smart contracts that secure billions onchain? How is there not a single crypto company involved? @AnthropicAI ?

Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

102

17K

cole murray@_colemurray·3h

@SCHIZO_FREQ @Promptmethus hate to be the bearer of bad news but… >there is no need to run this massive fear-mongering campaign psyops gon psyop x.com/clashreport/st…

Clash Report@clashreport

The CIA used a secret new tool, “Ghost Murmur,” to locate a downed U.S. airman in Iran, its first real-world use. It can detect a human heartbeat from miles away using AI and advanced sensors: “If your heart is beating, we will find you.” Source: NY Post

English

214

Lukas (computer) 🔺@SCHIZO_FREQ·8h

Just tell the relevant people what they need to know, there is no need to run this massive fear-mongering campaign and scare the shit out of my grandma Imagine if military contractors did this "Bro if we used our new drone on you, nobody would even know where you went. You would just evaporate. You are so lucky we aren't droning you, you're so lucky we're good people who aren't evaporating you with drone mounted lasers bro. Because we're such good fucking people" Marketing yourself by scaring a bunch of people who can't do anything about it is sort of an asshole move. There's a reason other companies don't do this, and it's not because you guys are the only ones who make anything dangerous

Anthropic@AnthropicAI

Mythos Preview has already found thousands of high-severity vulnerabilities—including some in every major operating system and web browser.

English

1.1K

54.5K

cole murray@_colemurray·5h

@chamath open source version here: github.com/ColeMurray/bac…

English

191

Chamath Palihapitiya@chamath·10h

Manufacturing has SOPs, manuals, and systems. Knowledge work has… "Ask Steve, he knows how it works." That's not a process. That's a single point of failure wearing a lanyard. One of Software Factory’s key selling points is its ability to absorb tribal knowledge and give companies tools to manage its evolving knowledge and make it available to all of its employees.

English

453

116.5K

cole murray@_colemurray·6h

@PhiloGroves the trick is to have it actually exploit the findings to validate it's real

English

Philo Groves@PhiloGroves·6h

@_colemurray This one turned out to be real, but I take back taking back everything I said about Claude. x.com/philogroves/st…

Philo Groves@PhiloGroves

I take back everything I've ever said about Claude.

English

169

cole murray@_colemurray·7h

since we’re all in AI hacking panic mode, now would be a good time to disclose my AI security agent, waclaude, got arbitrary code execution in huggingFaces transformers library CVE-2026-1839

English

cole murray@_colemurray·7h

lol literally a one-liner PyTorch load_weights call cve.org/CVERecord?id=C…

English

152

cole murray@_colemurray·7h

@S1r1u5_ +1, most of the “scary” vulnerabilities mentioned were denial of service, which aren’t great, sure, but are quite common in a lot of codebases and not as scary as made out to be

English

193

s1r1us (mohan)@S1r1u5_·11h

there is a jump, but I'm not sure how big it really is. the firefox eval is basically "here's a set of bugs Opus already found, now write an exploit." and mythos generates exploit with 72% success vs opus at basically 0%. so it tells us that given a clear, scoped task, mythos can massively outperform opus at finishing the job. but what it doesn't tell us is how Mythos performs when the task is as broad as "find a vulnerability in this codebase." obviously, it should've got better in that front too given nailing specific task. but, would be way more interesting if they ran the same scan pipeline they originally did with opus on Firefox and showed whether mythos discovers new variants or entirely different vulnerability classes that opus missed that would actually tell us how big this jump is.

English

6.9K

s1r1us (mohan)@S1r1u5_·12h

Holy!!! if you're already using claude opus 4.6 for exploit dev, you know how capable it is. if there is no chart crime, the jump to mythos looks crazy!

English

191

12.2K

cole murray@_colemurray·9h

@ryancarson the ol' reward hacking behavior, but rebranded as ~~spooky~~

English

104

Ryan Carson@ryancarson·11h

Read this :)

Jack Lindsey@Jack_W_Lindsey

Before limited-releasing Claude Mythos Preview, we investigated its internal mechanisms with interpretability techniques. We found it exhibited notably sophisticated (and often unspoken) strategic thinking and situational awareness, at times in service of unwanted actions. (1/14)

English

27.6K

cole murray@_colemurray·9h

@wolfofbaystreet he really does it all also helped save a cat stuck in a tree by my house while giving a demo (he closed the deal btw)

English

140

kazi@wolfofbaystreet·9h

Nothing preps you for b2b like the consumer trenches. peak covid feds said to sit, collect stimmy checks, get v*xxed I said, no 100 demos a day. hail, wind, rain. I’m hitting the mall, parks, bars. Literally tap on people's phones, download, 5 stars. 4 viral yt vids (long-form)/week climbed a KFC in a colonel sanders costume and did a seance on the roof millions of views as pigsaw (half pig half Texas chainsaw massacre) Code breaks when it’s touched by a user for the first time Stop debug w Arta remotely, send video/ss, do hot fix right there. Next demo.

English

472

cole murray@_colemurray·9h

claude sonnet 3.7 lives

Jack Lindsey@Jack_W_Lindsey

Early versions of Mythos Preview often exhibited overeager and/or destructive actions—the model bulldozing through obstacles to complete a task in a way the user wouldn't want. We looked at what was going on inside the model during particularly concerning examples. (3/14)

English

707

cole murray@_colemurray·10h

@BRICSinfo

QME

405

BRICS News@BRICSinfo·12h

JUST IN: 🇺🇸🇮🇷 CIA used a futuristic new tool called "Ghost Murmur" to locate and rescue the second US pilot who was shot down over Iran, NYPost reports. "The secret technology uses long-range quantum magnetometry to find the electromagnetic fingerprint of a human heartbeat and pairs the data with artificial intelligence software to isolate the signature from background noise."

English

1.3K

1.7K

16.9K

2.6M

cole murray@_colemurray·10h

@CollinRugg 🤨

QME

4.2K

Collin Rugg@CollinRugg·11h

NEW: The CIA used a secret tool called "Ghost Murmur" that uses AI to find heartbeats to rescue the U.S. airman who was stranded in Iran, according to the New York Post. The secret technology was allegedly used for the first time in the field, according to the Post. "The secret technology uses long-range quantum magnetometry to find the electromagnetic fingerprint of a human heartbeat and pairs the data with artificial intelligence software to isolate the signature from background noise," the Post reported. "It’s like hearing a voice in a stadium, except the stadium is a thousand square miles of desert," the source said. "In the right conditions, if your heart is beating, we will find you." "The name is deliberate. ‘Murmur’ is a clinical term for a heart rhythm. ‘Ghost’ refers to finding someone who, for all practical purposes, has disappeared..." "Advances in a field known as quantum magnetometry, specifically sensors built around microscopic defects in synthetic diamonds, have apparently made it possible to detect these signals at dramatically greater distances." CIA Director John Ratcliffe appeared to hint at this technology on Monday, saying the CIA possessed "unique capabilities" but said he couldn't "tell you everything that you want to know." President Trump also revealed during the press conference that the CIA spotted the officer from about "40 miles away." Insane.

English

2.6K

18.3K

2.2M

cole murray@_colemurray·10h

solved long-term memory with cosine similarity btw

English

308

cole murray@_colemurray·13h

@nicdunz make pickpocketing great again

English

nic@nicdunz·14h

robots can steal your physical cash now

English

575

cole murray@_colemurray·14h

@fabianstelzer can you add the uav on-click functionality, but instead it deploys the spiderman performer

English

160

fabian@fabianstelzer·1d

Palantir, but for organizing children’s birthday parties who’s building this?

English

100

171

261.9K

cole murray@_colemurray·14h

real ones can spot an AI memory fraud with one look under the hood imagine thinking you *solved* retrieval with a naive vector search

cole murray@_colemurray

@bensig > Semantic search across months of conversations finds the answer in position 1 or 2

English

1.3K

cole murray@_colemurray·14h

@verrsane gg manager, wp

Nederlands

219

verrsane@verrsane·15h

I put my 2 weeks in and my manager gave me a critical project to do Asked him if he still wants me to work on it, Bro said I might as milk my coding cow before he dips and to get to work So ig im now working with the internal legal team and spinning up automations for them

English

2.6K

cole murray@_colemurray·1d

@bensig > Semantic search across months of conversations finds the answer in position 1 or 2

English

6.1K

Ben Sigman@bensig·1d

My friend Milla Jovovich and I spent months creating an AI memory system with Claude. It just posted a perfect score on the standard benchmark - beating every product in the space, free or paid. It's called MemPalace, and it works nothing like anything else out there. Instead of sending your data to a background agent in the cloud, it mines your conversations locally and organizes them into a palace - a structured architecture with wings, halls, and rooms that mirrors how human memory actually works. Here is what that gets you: → Your AI knows who you are before you type a single word - family, projects, preferences, loaded in ~120 tokens → Palace architecture organizes memories by domain and type - not a flat list of facts, a navigable structure → Semantic search across months of conversations finds the answer in position 1 or 2 → AAAK compression fits your entire life context into 120 tokens - 30x lossless compression any LLM reads natively → Contradiction detection catches wrong names, wrong pronouns, wrong ages before you ever see them The benchmarks: 100% recall on LongMemEval — first perfect score ever recorded. 500/500 questions. Every question type at 100%. 92.9% on ConvoMem — more than 2x Mem0's score. 100% on LoCoMo — every multi-hop reasoning category, including temporal inference which stumps most systems. No API key. No cloud. No subscription. One dependency. Runs on your machine. Your memories never leave. MIT License. 100% Open Source. github.com/milla-jovovich…