cole murray

8.2K posts

cole murray banner
cole murray

cole murray

@colemurray

ai/ml | cto | prev founder | former sr. sde @ amazon

San Francisco, CA شامل ہوئے Şubat 2015
980 فالونگ4.1K فالوورز
پن کیا گیا ٹویٹ
cole murray
cole murray@colemurray·
Advice given to someone asking about AI Consulting: I don't think an ML background is required to be successful in AI consulting, but obviously helps. I think the biggest "skill" learned in ML is how to successfully do feedback loops in a system. In an ML system, this typically involves cleaning data, making model tweaks, performance evals etc. In LLMs, in nearly every case you won't be fine-tuning the model, but iterating on prompts is a very similar workflow. I do think it would be helpful to at least get a high-level learning of how the models "actually" work and become familiar with the basic terms. e.g. tokens, transformers, attention, what happens on each input -> output iteration as the model is predicting. You don't need to know the underlying math (helpful though), but having the understanding of what is happening is helpful. Most of the AI consulting market is more on full-stack / product development skills and less ML. This isn't the most lucrative opportunities, but they are available in abundance. Major areas now and over the next year: - RAG: this is basically just glorified search lol. Useful in many contexts but severely overhyped - Agents: The models aren't quite there yet IMO for this to be useful, but in 2025 I think this will be a major theme and a HUGE area of interest/investment. Becoming good at this will be valuable. - Evals: Performance evaluations are a relatively untapped market. Most AI products you see today are flying by the seat of their pants. Without eval metrics, you can't truly know if your prompt changes are improving the system. This is somewhat more difficult to sell as a consultant as it requires a more sophisticated buyer, but is worth a lot of money if you can do it well
English
10
16
237
48.8K
cole murray
cole murray@colemurray·
i have a solution to the fable issue what if we just rename it...
cole murray tweet media
English
0
0
7
487
Dennison
Dennison@DennisonBertram·
Major life hack: DeepSeek in the Claude Code harness can also build and drive workflows, at a fraction of the cost and Opus 4.6/7 quality. I've got it running over 250 subagents in a workflow in adversarial reviews. Pennies on the dollar. Use my tool "Deep-Claude"
Dennison tweet media
English
36
34
533
64.5K
Max Katz
Max Katz@maxktz·
life lesson: never bet on a custom harness like Pi been loving my custom Pi setup for the last few weeks, the fact that I can build any extension, use any models but things are moving too fast today huge teams behind Claude / Codex change the way we develop almost every month so by building and maintaining a custom agent you're more likely to get left behind most models perform better in their native harnesses anyway, and using external ones is likely to get banned so I recommend betting your workflow on a portable primitives, like prompts, skills or scripts, instead of custom agents
English
51
0
158
48.9K
cole murray
cole murray@colemurray·
@HamelHusain i find MCP-based skill retrieval has issues with non-determinism, where often the "correct" skills don't get loaded when needed. Vercel had some good research on this here: TLDR: jam all the skills in agents.md vercel.com/blog/agents-md…
English
0
0
5
278
Hamel Husain
Hamel Husain@HamelHusain·
Out of all the replies this solution looks especially clean x.com/lolbrandonk/st…
Hamel Husain@HamelHusain

What’s the best way for non developers to 1. share skills with their team 2. automatically enforce that it’s always updated for everyone if changes are made 3 allow others to update it centrally Github is not the best solution as it’s too clunky and doesn’t solve #2 Notion is a little better but can’t put code there I’m tempted to create my own tools but someone surely has created this already??

English
7
1
38
19.6K
cole murray
cole murray@colemurray·
assuming it had a similar running cost as an opus-level model, i don't think much would change. Opus is already quite good at cyber capabilities and there is plenty of low-hanging fruit left for motivated actors Low-level actors do not have the finances to be able to run a Mythos-level model for anything significant (using Anthropic's $20,000 OpenBSD bug as a reference) For sophisticated actors, i'm not convinced it changes the existing landscape much. Exploit development is a fairly small part of a much larger kill chain and letting a non-deterministic system operate in a stealth-sensitive environment seems like a good way to burn the operation.
English
2
1
13
542
Zack Korman
Zack Korman@ZackKorman·
I feel everyone is talking about cyber risk with very little input from cybersecurity. For people in cyber, I want your take: How good or bad would it be for cyber if an open-weight no-guardrails Mythos-level model released tomorrow?
English
163
11
220
54.5K
Emily Yuan
Emily Yuan@emily_yuan_·
Imagine getting sued because your AI agent messes up. That's why we built AI coverage at @UseCorgi to help cover new types of risks that AI is creating during this technological shift. We give our AI agents a lot of autonomy today (e.g. pushing code to production, talking to customers, processing payments). And sometimes, they get things wrong.
English
20
13
86
10.3K
cole murray
cole murray@colemurray·
imagine being so gpu poor, you have to stage an export controls ban to avoid hosting your model
English
0
0
18
726
signüll
signüll@signulll·
i had a convo with someone at a big lab recently where she framed my account as an umpire calling balls & strikes. & i deeply appreciated that.
English
8
0
122
10.1K
Gwen (Chen) Shapira
Gwen (Chen) Shapira@gwenshap·
Folks who use Codex/Claude Code for SRE-like stuff... what's your solution to log files eating up context and tokens like crazy?
English
32
8
52
23.8K
David Cramer
David Cramer@zeeg·
What open source are folks building today?
English
90
4
75
20.5K
Liran Tal
Liran Tal@liran_tal·
@colemurray @vercel_dev Hmmm, I don't know about that. In one of my past engineering orgs about 10 years ago we had a stock config of about 20 running containers as part of the local dev setup Most of the "precious work" is token generation and that's a remote async i/o job
English
1
0
0
16
cole murray
cole murray@colemurray·
After a week or so of using Vercel Sandboxes, @vercel_dev, some thoughts: - easy to integrate with - expensive, especially snapshots - UI/UX in console needs improvements The good: Integrating the sandbox into OpenInspect was fairly straight forward. I was able to support all of the existing functionality OpenInspect relies on (snapshots, pre-builds, tunnel ports etc). The sandboxes themselves are snappy enough and comparable to the other providers integrated (at least specific to my needs) Having both the web-app and sandboxes in one provider is nice, as one less set of API keys and service to manage. The bad: Vercel Sandbox UI The UI is pretty lacking. There is no visibility into the stdout/stderr going on in the sandbox. This is a pretty big dealbreaker, as i now need to setup some telemetry to observe what's going on. spoiler: i'm not going to Additionally, there is no way to bulk delete snapshots from the UI, which is quite tedious / under-thought IMO. Yes it works from CLI, but sometimes it's nice to just click buttons. Costs The realized usage costs are significantly more than the other sandbox providers. Specifically, the sandbox data transfer is quite costly and sandbox storage are quite inflated. Data transfer: $0.15/gb (lol) Data storage: $0.08/gb (lol) Within ~50 sessions, I racked up $15 or so in costs doing effectively just Git repo pulls Data storage specifically is pretty brutal, as OpenInspect snapshots between each turn, and the snapshots are not billed incrementally. Additionally, the min compute size is 2Cpu/4GB RAM, which is more than needed. Would like to see lower options available! TLDR; overall it's a nice experience and having more services on one provider is nice. Looking forward to the service improving!
English
11
1
18
3.1K
cole murray
cole murray@colemurray·
i'll take the under on this. there is such a thing as too much data. i don't think it's incremental value to have observability into every change, conversation, doc edit, etc. At a point, it becomes a distraction and more noise than signal. i find a lot of teams when first rolling out analytics to their product have this same idea. "we're going to instrument everything: every impression, every scroll, every hover, every click, mouse cursor position, time of day, ..." in practice, they end up making up narratives about the noise they're seeing, rather than validating their actual experiment hypothesis. simple is often better
John Suh@john_ssuh

Increasingly, I believe companies may need to be rebuilt from the ground up, where you have a single timeline of all observability + product metrics + file changes laid out in a retrievable system, like Datadog + Posthog + Google Drive + Slack (really unified filesystem of Claude Code chats + Codex chats). This might be the new data foundation for any and all companies to maximize AI. Needs to be rebuilt because keeping track of diffs on existing system basically impossible to produce longitudinal information on decisions and rollbacks, something coding agent storage companies are actively trying to figure out, but this should extend to businesses as a whole. Highly skeptical existing businesses will adopt this though because it means overhauling everything about their instrumentation and business data, but I think businesses built on this foundation probably can execute 100x better and faster

English
2
1
20
2.4K
cole murray
cole murray@colemurray·
@joannejang probably an expert in the field someone who just joined 🤔
English
0
0
9
628
Joanne Jang
Joanne Jang@joannejang·
kinda crazy that someone's full-time job was to steer claude to sabotage ML research capabilities for paying customers
English
71
162
3.5K
140.2K
cole murray
cole murray@colemurray·
@bosmeny lol imagine saying this while delve skated by without oversight
English
4
1
244
16.1K
Tyler Bosmeny
Tyler Bosmeny@bosmeny·
I once had to kick a founder out of YC for lying. Thankfully we caught them before the batch started. Today I found out they're running a European startup accelerator 🤷‍♂️
English
129
18
1.6K
247.5K