cole murray

8.2K posts

cole murray

@colemurray

ai/ml | cto | prev founder | former sr. sde @ amazon

San Francisco, CA เข้าร่วม Şubat 2015

980 กำลังติดตาม4.1K ผู้ติดตาม

ทวีตที่ปักหมุด

cole murray@colemurray·25 Eki

Advice given to someone asking about AI Consulting: I don't think an ML background is required to be successful in AI consulting, but obviously helps. I think the biggest "skill" learned in ML is how to successfully do feedback loops in a system. In an ML system, this typically involves cleaning data, making model tweaks, performance evals etc. In LLMs, in nearly every case you won't be fine-tuning the model, but iterating on prompts is a very similar workflow. I do think it would be helpful to at least get a high-level learning of how the models "actually" work and become familiar with the basic terms. e.g. tokens, transformers, attention, what happens on each input -> output iteration as the model is predicting. You don't need to know the underlying math (helpful though), but having the understanding of what is happening is helpful. Most of the AI consulting market is more on full-stack / product development skills and less ML. This isn't the most lucrative opportunities, but they are available in abundance. Major areas now and over the next year: - RAG: this is basically just glorified search lol. Useful in many contexts but severely overhyped - Agents: The models aren't quite there yet IMO for this to be useful, but in 2025 I think this will be a major theme and a HUGE area of interest/investment. Becoming good at this will be valuable. - Evals: Performance evaluations are a relatively untapped market. Most AI products you see today are flying by the seat of their pants. Without eval metrics, you can't truly know if your prompt changes are improving the system. This is somewhat more difficult to sell as a consultant as it requires a more sophisticated buyer, but is worth a lot of money if you can do it well

English

237

48.9K

cole murray@colemurray·1h

@sqs

GIF

QME

Quinn Slack@sqs·21h

You can now proactively verify your identity (with a passport or government ID) in case it’s needed for future frontier model access in Amp. We think it probably will be, and we want Amp to keep giving you access to the best models available to you. We can’t guarantee access criteria or timelines. Those depend on (highly uncertain) government and model lab policy. We don’t plan to impose any additional restrictions beyond what is required by law and the model labs. We are covering the cost for identity verification for all users, and we’re using Stripe for identity verification, so Amp stores nothing and sees only the outcome. ampcode.com/settings/ident…

English

146

67.8K

cole murray@colemurray·3h

@charliebcurran need a whole series

English

142

Charles Curran@charliebcurran·5h

I used AI to explain the Anthropic drama to my girlfriend, with fruit.

English

128

201

2.5K

249.5K

cole murray@colemurray·10h

me remembering fable is banned so you can’t

GIF

English

527

cole murray@colemurray·11h

can someone run this on some of the coding benchmarks? impressive, but I don’t particularly need deep research and want raw eng performance.

OpenRouter@OpenRouter

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

English

8.7K

cole murray@colemurray·10h

@VicVijayakumar was AI involved in the diagnosis process?

English

276

Vic 🌮@VicVijayakumar·1d

The AC stopped working in my 8yo’s room sometime on Thursday. I called the HVAC company but they were busy. Yesterday her room hit 97F! This morning we went to the hardware store and grabbed a replacement capacitor ($22.97) and everything is back to normal. She screamed “DAD YOU’RE AMAZING” and that was the actual best part.

English

1.1K

80K

cole murray@colemurray·21h

i have a solution to the fable issue what if we just rename it...

English

1.1K

cole murray@colemurray·1d

@DennisonBertram @yingyangwins sir, they’re asking us to show something useful we made with those 250 agents

English

745

Dennison@DennisonBertram·1d

Major life hack: DeepSeek in the Claude Code harness can also build and drive workflows, at a fraction of the cost and Opus 4.6/7 quality. I've got it running over 250 subagents in a workflow in adversarial reviews. Pennies on the dollar. Use my tool "Deep-Claude"

English

588

71.2K

cole murray@colemurray·1d

@maxktz co-sign x.com/colemurray/sta…

cole murray@colemurray

conversely, i think most teams probably shouldn’t be building their own harness. it is unlikely you will have novel ideas around sub-agent orchestration, compaction, progressive disclosure etc that are worth owning the entire harness spend your time investing in the pieces around the harness: - execution infrastructure - custom tools, MCPs and skills - self improvement on trajectories

English

744

Max Katz@maxktz·1d

life lesson: never bet on a custom harness like Pi been loving my custom Pi setup for the last few weeks, the fact that I can build any extension, use any models but things are moving too fast today huge teams behind Claude / Codex change the way we develop almost every month so by building and maintaining a custom agent you're more likely to get left behind most models perform better in their native harnesses anyway, and using external ones is likely to get banned so I recommend betting your workflow on a portable primitives, like prompts, skills or scripts, instead of custom agents

English

170

63K

cole murray@colemurray·1d

@HamelHusain i find MCP-based skill retrieval has issues with non-determinism, where often the "correct" skills don't get loaded when needed. Vercel had some good research on this here: TLDR: jam all the skills in agents.md vercel.com/blog/agents-md…

English

289

Hamel Husain@HamelHusain·1d

Out of all the replies this solution looks especially clean x.com/lolbrandonk/st…

Hamel Husain@HamelHusain

What’s the best way for non developers to 1. share skills with their team 2. automatically enforce that it’s always updated for everyone if changes are made 3 allow others to update it centrally Github is not the best solution as it’s too clunky and doesn’t solve #2 Notion is a little better but can’t put code there I’m tempted to create my own tools but someone surely has created this already??

English

20.1K

cole murray@colemurray·1d

@tekbog nobody escapes uncle jeff

English

297

terminally onλine εngineer@tekbog·1d

crazy how AWS brings everyone down eventually

Polymarket@Polymarket

NEW: Amazon researchers are reportedly behind the jailbreak report that led to the U.S. crackdown on Anthropic’s top models.

English

495

14.7K

cole murray@colemurray·1d

assuming it had a similar running cost as an opus-level model, i don't think much would change. Opus is already quite good at cyber capabilities and there is plenty of low-hanging fruit left for motivated actors Low-level actors do not have the finances to be able to run a Mythos-level model for anything significant (using Anthropic's $20,000 OpenBSD bug as a reference) For sophisticated actors, i'm not convinced it changes the existing landscape much. Exploit development is a fairly small part of a much larger kill chain and letting a non-deterministic system operate in a stealth-sensitive environment seems like a good way to burn the operation.

English

561

Zack Korman@ZackKorman·1d

I feel everyone is talking about cyber risk with very little input from cybersecurity. For people in cyber, I want your take: How good or bad would it be for cyber if an open-weight no-guardrails Mythos-level model released tomorrow?

English

169

223

58.3K

cole murray@colemurray·1d

@emily_yuan_ @UseCorgi would this help with legal fees for an export control directive? asking for a friend

English

158

Emily Yuan@emily_yuan_·2d

Imagine getting sued because your AI agent messes up. That's why we built AI coverage at @UseCorgi to help cover new types of risks that AI is creating during this technological shift. We give our AI agents a lot of autonomy today (e.g. pushing code to production, talking to customers, processing payments). And sometimes, they get things wrong.

English

10.6K

cole murray@colemurray·1d

imagine being so gpu poor, you have to stage an export controls ban to avoid hosting your model

English

746

cole murray@colemurray·1d

@signulll hate to be the bearer of bad news

English

444

signüll@signulll·1d

i had a convo with someone at a big lab recently where she framed my account as an umpire calling balls & strikes. & i deeply appreciated that.

English

124

10.5K

cole murray@colemurray·2d

@keennay tire folded under pressure

English

cole murray@colemurray·2d

@gwenshap custom cli wrapper over the logs api/service

English

268

Gwen (Chen) Shapira@gwenshap·3d

Folks who use Codex/Claude Code for SRE-like stuff... what's your solution to log files eating up context and tokens like crazy?

English

23.8K

cole murray@colemurray·2d

@zeeg OpenInspect! github.com/ColeMurray/bac…

English

147

David Cramer@zeeg·2d

What open source are folks building today?

English

20.9K

cole murray@colemurray·2d

@liran_tal @vercel_dev where we are going, you'll need way more than 20 containers lol

English

Liran Tal@liran_tal·2d

@colemurray @vercel_dev Hmmm, I don't know about that. In one of my past engineering orgs about 10 years ago we had a stock config of about 20 running containers as part of the local dev setup Most of the "precious work" is token generation and that's a remote async i/o job

English

cole murray@colemurray·3d

After a week or so of using Vercel Sandboxes, @vercel_dev, some thoughts: - easy to integrate with - expensive, especially snapshots - UI/UX in console needs improvements The good: Integrating the sandbox into OpenInspect was fairly straight forward. I was able to support all of the existing functionality OpenInspect relies on (snapshots, pre-builds, tunnel ports etc). The sandboxes themselves are snappy enough and comparable to the other providers integrated (at least specific to my needs) Having both the web-app and sandboxes in one provider is nice, as one less set of API keys and service to manage. The bad: Vercel Sandbox UI The UI is pretty lacking. There is no visibility into the stdout/stderr going on in the sandbox. This is a pretty big dealbreaker, as i now need to setup some telemetry to observe what's going on. spoiler: i'm not going to Additionally, there is no way to bulk delete snapshots from the UI, which is quite tedious / under-thought IMO. Yes it works from CLI, but sometimes it's nice to just click buttons. Costs The realized usage costs are significantly more than the other sandbox providers. Specifically, the sandbox data transfer is quite costly and sandbox storage are quite inflated. Data transfer: $0.15/gb (lol) Data storage: $0.08/gb (lol) Within ~50 sessions, I racked up $15 or so in costs doing effectively just Git repo pulls Data storage specifically is pretty brutal, as OpenInspect snapshots between each turn, and the snapshots are not billed incrementally. Additionally, the min compute size is 2Cpu/4GB RAM, which is more than needed. Would like to see lower options available! TLDR; overall it's a nice experience and having more services on one provider is nice. Looking forward to the service improving!

English

3.1K

cole murray@colemurray·2d

@liran_tal @vercel_dev x.com/colemurray/sta…

cole murray@colemurray

localhost doesn't scale and it will only get worse as agents improve on long running tasks there's a better way

QME

Liran Tal@liran_tal·2d

@colemurray @vercel_dev Why would you not run isolated environments locally based on containers?

English

401

cole murray@colemurray·3d

@natolambert stockholm syndrome

Svenska

339

Nathan Lambert@natolambert·3d

Props to Anthropic for quick action here. I'm okay with this outcome. Some people may, but I don't think they'd silently degrade performance without telling users.

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash. “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

English

237

33K

ค้นพบ

@sqs @charliebcurran @VicVijayakumar @DennisonBertram @yingyangwins @maxktz @HamelHusain @tekbog