cole murray

8.2K posts

cole murray banner
cole murray

cole murray

@colemurray

ai/ml | cto | prev founder | former sr. sde @ amazon

San Francisco, CA เข้าร่วม Şubat 2015
980 กำลังติดตาม4.1K ผู้ติดตาม
ทวีตที่ปักหมุด
cole murray
cole murray@colemurray·
Advice given to someone asking about AI Consulting: I don't think an ML background is required to be successful in AI consulting, but obviously helps. I think the biggest "skill" learned in ML is how to successfully do feedback loops in a system. In an ML system, this typically involves cleaning data, making model tweaks, performance evals etc. In LLMs, in nearly every case you won't be fine-tuning the model, but iterating on prompts is a very similar workflow. I do think it would be helpful to at least get a high-level learning of how the models "actually" work and become familiar with the basic terms. e.g. tokens, transformers, attention, what happens on each input -> output iteration as the model is predicting. You don't need to know the underlying math (helpful though), but having the understanding of what is happening is helpful. Most of the AI consulting market is more on full-stack / product development skills and less ML. This isn't the most lucrative opportunities, but they are available in abundance. Major areas now and over the next year: - RAG: this is basically just glorified search lol. Useful in many contexts but severely overhyped - Agents: The models aren't quite there yet IMO for this to be useful, but in 2025 I think this will be a major theme and a HUGE area of interest/investment. Becoming good at this will be valuable. - Evals: Performance evaluations are a relatively untapped market. Most AI products you see today are flying by the seat of their pants. Without eval metrics, you can't truly know if your prompt changes are improving the system. This is somewhat more difficult to sell as a consultant as it requires a more sophisticated buyer, but is worth a lot of money if you can do it well
English
10
16
237
48.9K
Quinn Slack
Quinn Slack@sqs·
You can now proactively verify your identity (with a passport or government ID) in case it’s needed for future frontier model access in Amp. We think it probably will be, and we want Amp to keep giving you access to the best models available to you. We can’t guarantee access criteria or timelines. Those depend on (highly uncertain) government and model lab policy. We don’t plan to impose any additional restrictions beyond what is required by law and the model labs. We are covering the cost for identity verification for all users, and we’re using Stripe for identity verification, so Amp stores nothing and sees only the outcome. ampcode.com/settings/ident…
English
32
9
146
67.8K
Charles Curran
Charles Curran@charliebcurran·
I used AI to explain the Anthropic drama to my girlfriend, with fruit.
English
128
201
2.5K
249.5K
cole murray
cole murray@colemurray·
me remembering fable is banned so you can’t
GIF
English
0
0
4
527
Vic 🌮
Vic 🌮@VicVijayakumar·
The AC stopped working in my 8yo’s room sometime on Thursday. I called the HVAC company but they were busy. Yesterday her room hit 97F! This morning we went to the hardware store and grabbed a replacement capacitor ($22.97) and everything is back to normal. She screamed “DAD YOU’RE AMAZING” and that was the actual best part.
Vic 🌮 tweet mediaVic 🌮 tweet media
English
50
6
1.1K
80K
cole murray
cole murray@colemurray·
i have a solution to the fable issue what if we just rename it...
cole murray tweet media
English
1
0
13
1.1K
Dennison
Dennison@DennisonBertram·
Major life hack: DeepSeek in the Claude Code harness can also build and drive workflows, at a fraction of the cost and Opus 4.6/7 quality. I've got it running over 250 subagents in a workflow in adversarial reviews. Pennies on the dollar. Use my tool "Deep-Claude"
Dennison tweet media
English
37
37
588
71.2K
Max Katz
Max Katz@maxktz·
life lesson: never bet on a custom harness like Pi been loving my custom Pi setup for the last few weeks, the fact that I can build any extension, use any models but things are moving too fast today huge teams behind Claude / Codex change the way we develop almost every month so by building and maintaining a custom agent you're more likely to get left behind most models perform better in their native harnesses anyway, and using external ones is likely to get banned so I recommend betting your workflow on a portable primitives, like prompts, skills or scripts, instead of custom agents
English
53
1
170
63K
cole murray
cole murray@colemurray·
@HamelHusain i find MCP-based skill retrieval has issues with non-determinism, where often the "correct" skills don't get loaded when needed. Vercel had some good research on this here: TLDR: jam all the skills in agents.md vercel.com/blog/agents-md…
English
0
0
5
289
Hamel Husain
Hamel Husain@HamelHusain·
Out of all the replies this solution looks especially clean x.com/lolbrandonk/st…
Hamel Husain@HamelHusain

What’s the best way for non developers to 1. share skills with their team 2. automatically enforce that it’s always updated for everyone if changes are made 3 allow others to update it centrally Github is not the best solution as it’s too clunky and doesn’t solve #2 Notion is a little better but can’t put code there I’m tempted to create my own tools but someone surely has created this already??

English
7
1
40
20.1K
cole murray
cole murray@colemurray·
assuming it had a similar running cost as an opus-level model, i don't think much would change. Opus is already quite good at cyber capabilities and there is plenty of low-hanging fruit left for motivated actors Low-level actors do not have the finances to be able to run a Mythos-level model for anything significant (using Anthropic's $20,000 OpenBSD bug as a reference) For sophisticated actors, i'm not convinced it changes the existing landscape much. Exploit development is a fairly small part of a much larger kill chain and letting a non-deterministic system operate in a stealth-sensitive environment seems like a good way to burn the operation.
English
2
1
13
561
Zack Korman
Zack Korman@ZackKorman·
I feel everyone is talking about cyber risk with very little input from cybersecurity. For people in cyber, I want your take: How good or bad would it be for cyber if an open-weight no-guardrails Mythos-level model released tomorrow?
English
169
11
223
58.3K
Emily Yuan
Emily Yuan@emily_yuan_·
Imagine getting sued because your AI agent messes up. That's why we built AI coverage at @UseCorgi to help cover new types of risks that AI is creating during this technological shift. We give our AI agents a lot of autonomy today (e.g. pushing code to production, talking to customers, processing payments). And sometimes, they get things wrong.
English
20
13
88
10.6K
cole murray
cole murray@colemurray·
imagine being so gpu poor, you have to stage an export controls ban to avoid hosting your model
English
0
0
18
746
signüll
signüll@signulll·
i had a convo with someone at a big lab recently where she framed my account as an umpire calling balls & strikes. & i deeply appreciated that.
English
8
0
124
10.5K
Gwen (Chen) Shapira
Gwen (Chen) Shapira@gwenshap·
Folks who use Codex/Claude Code for SRE-like stuff... what's your solution to log files eating up context and tokens like crazy?
English
32
9
52
23.8K
David Cramer
David Cramer@zeeg·
What open source are folks building today?
English
90
4
76
20.9K
Liran Tal
Liran Tal@liran_tal·
@colemurray @vercel_dev Hmmm, I don't know about that. In one of my past engineering orgs about 10 years ago we had a stock config of about 20 running containers as part of the local dev setup Most of the "precious work" is token generation and that's a remote async i/o job
English
1
0
0
16
cole murray
cole murray@colemurray·
After a week or so of using Vercel Sandboxes, @vercel_dev, some thoughts: - easy to integrate with - expensive, especially snapshots - UI/UX in console needs improvements The good: Integrating the sandbox into OpenInspect was fairly straight forward. I was able to support all of the existing functionality OpenInspect relies on (snapshots, pre-builds, tunnel ports etc). The sandboxes themselves are snappy enough and comparable to the other providers integrated (at least specific to my needs) Having both the web-app and sandboxes in one provider is nice, as one less set of API keys and service to manage. The bad: Vercel Sandbox UI The UI is pretty lacking. There is no visibility into the stdout/stderr going on in the sandbox. This is a pretty big dealbreaker, as i now need to setup some telemetry to observe what's going on. spoiler: i'm not going to Additionally, there is no way to bulk delete snapshots from the UI, which is quite tedious / under-thought IMO. Yes it works from CLI, but sometimes it's nice to just click buttons. Costs The realized usage costs are significantly more than the other sandbox providers. Specifically, the sandbox data transfer is quite costly and sandbox storage are quite inflated. Data transfer: $0.15/gb (lol) Data storage: $0.08/gb (lol) Within ~50 sessions, I racked up $15 or so in costs doing effectively just Git repo pulls Data storage specifically is pretty brutal, as OpenInspect snapshots between each turn, and the snapshots are not billed incrementally. Additionally, the min compute size is 2Cpu/4GB RAM, which is more than needed. Would like to see lower options available! TLDR; overall it's a nice experience and having more services on one provider is nice. Looking forward to the service improving!
English
11
1
18
3.1K