Dylan Murphy

707 posts

Dylan Murphy

@dylanmurphy829

Tech sales leader, dad, spaceflight nerd. Writing at The Slow Layer.

Boston, MA Присоединился Eylül 2014

323 Подписки111 Подписчики

Dylan Murphy@dylanmurphy829·17h

Do you have an opinion on how task-based model routing will start to play out, at least in the enterprise? Specifically I mean: we’re approaching the point where open weight models are getting “good enough” for routine tasks and automations. But the frontier shows no signs of plateauing as you caused to be illustrated here. So it seems we’ll soon be in a state of super cheap “good enough” intelligence and premium-priced SOTA intelligence, with no real routing layer to deal with that in a thoughtful way.

English

533

Ethan Mollick@emollick·19h

It does seem like meaningfully better AI releases are accelerating, especially from OpenAI & Anthropic. To illustrate, I caused this timeline to be created. It only lists new models that scored 3 points or higher over previous models in the Artificial Analysis index.

English

589

38K

Jimmy Heaters@CathPoaster·1d

right monitor is 20 codex instances. left monitor has situational awareness on autoscroll. center monitor is my word doc mainfesto. two keyboards, one for both hands. left airpod is dwarkesh x eric jang, 3x speed. right airpod tchaikovsky. meta quest 3 overlays my HUD: heart rate, words per minute, blood caffeine content. one assistant hooks me to an iv of chinese peptides, cocktail. the other feeds me kimchi. my unitree robot steps in when my posture slouches. blue light beams down on me in my herman miller chair. efficiency. no wasted movement. no wasted thoughts. think you can keep up with me? good luck. this is just for my morning emails.

English

225

219.4K

Dylan Murphy@dylanmurphy829·18h

@CathPoaster 🫡

QME

111

Jimmy Heaters@CathPoaster·18h

@dylanmurphy829 calling it here. best dad of the year award has been clinched by mr murphy

English

2.3K

Dylan Murphy@dylanmurphy829·18h

@Chrisgpt Lmfao

English

Chris@Chrisgpt·1d

Hell yeahhhh

Tenobrus@tenobrus

fuuuuuuuuuuuuck

English

14.2K

Dylan Murphy@dylanmurphy829·18h

I agreed with the political incentives take. In the jobs displacement topic, I think you’re missing some nuance. For sure displacement is coming for taxi, truck and uber drivers, for package sorters and some low skill factory line workers - but those all share the same traits of being narrow physical world labor operating in areas without massive demand elasticity. For general knowledge work I just don’t think it’s playing out this way and a 10X+ increase in digital economic activity is going to continue to drive up demand for humans. With that said its becoming apparent that knowledge workers who actively opt out of learning and upskilling with the latest tools are probably going to be at greater risk but I think even that is going to take a loooong time to diffuse through the broader economy. For sure we’ll see public companies get rewarded for doing more with less but as the great @bgurley put it - competition is a thing. In the meantime we have a Cambrian explosion of startups and solopreneurs creating new categories and existing small businesses experiencing a productivity boom. Is the net of all that massive job displacement? Non zero chance probably but I think it’s anything but a sure bet and loaded with nuance and uncertainty.

English

142

@jason@Jason·21h

If you’re in/around the administration, you're obligated to keep up the narrative that AI is creating jobs. If you're a Democrat or America First/America Only, you're obligated to spread the job destruction narrative. That's how AOC and Steve Bannon find themselves on the same team. If you're an independent thinker, you understand that we will have massive AI job *displacement* combined with profound abundance (lower prices, shorter work weeks and new jobs) This means extreme winners AND losers, which will require thoughtful change management.

English

273

912

102.8K

Dylan Murphy@dylanmurphy829·19h

Speaking as someone very far from frontier math, this does a way better job of articulating what I’ve been trying to explain to my doomer friends over the past couple months. I have automated entire categories of tasks and the volume and scope of net new work I’m finding daily still seems to be accelerating. I’m seeing this in other parts of my org as well. I’m going to need to ask people on my team to level up and take on more, expanding their strategic impact and career progression. I’m going to need to hire for new roles to take on entirely new categories of work. My customers are going to keep demanding more from my team and that creates it’s own feedback loop in which the human becomes even more critical. Thankfully the lab CEOs have finally started to come around to this reality and stopped the jobs apocalypse pronouncements. We need better spokespeople for what’s actually happening.

John Ennis@johnennis

In the middle of my longest Codex run ever, coming up on 3 straight days I've adapated @elves_skill with a workflow for pure math research, and I've been working with my former PhD advisor on some open problems in differential geometry Looks like I'll get my first pure math paper in a while, and I've learned a ton about what is actually possible when it comes to AI for math The short of it is that AI is amazingly useful, and any mathematician who doesn't use these tools is at a big disadvantage, but the idea that there will be nothing left for human mathematicians to do is just wrong I'll write up more about this experience when done if people are interested, and I'm thinking about adding a math module to Elves

English

Dylan Murphy@dylanmurphy829·21h

@johnennis @pmarca @elves_skill Bullish for humans

English

John Ennis@johnennis·1d

English

1.1K

249.6K

Dylan Murphy@dylanmurphy829·1d

Currently running all my work stuff through Opus 4.8 (luxury of legacy enterprise plan) and all personal stuff through Codex / GPT 5.5 Pro. Each have their quirks, I still trust Opus way more for writing and artifact generation but Codex feels like more latent autonomy. Been experimenting with GPT 5.5 Pro in chat for high level planning and research, and it's been hard to replicate the effectiveness for those discrete tasks with any other tools. @emollick talked about that recently

English

323

Katie Parrott@kplikethebird·1d

My life right now is really splitting into GPT-5.5 for tasks and workflows,Opus 4.8 for creative work. Codex in the streets, Claude in the sheets.

English

113

11.5K

Dylan Murphy@dylanmurphy829·1d

@ajambrosino Can you tell when I switched to Codex :-)

English

145

Andrew Ambrosino@ajambrosino·1d

show us yours

Andrew Ambrosino@ajambrosino

tokens

English

144

244

42K

Dylan Murphy@dylanmurphy829·1d

@signulll Comparison is the thief of joy

English

signüll@signulll·1d

it’s kinda nuts how much social media turned almost all of lifestyle into performance art.

English

217

2.5K

73.9K

Dylan Murphy@dylanmurphy829·1d

@Kantrowitz Yeah it's pretty easy to see the path at this point where you just go to your preferred interface and all of your work, communications, brainstorming, creation and even entertainment just happens in one place. This is what the frontier labs are building towards.

English

Alex Kantrowitz@Kantrowitz·2d

Actually, 'super app' is not a misnomer for apps like Codex and Claude Code if you think about where these tools are heading

English

2.1K

Dylan Murphy@dylanmurphy829·1d

This would also make it much easier to have your agent build a true hill-climbing scaffold. Rather than just relying on the agent to invent the next ladder rung for itself (which falls into the failure mode you mention) .. you instead have a meta eval layer that orchestrates the hill-climb. Damn feels like a hard problem though.

English

192

λux@novasarc01·2d

i’m increasingly convinced that the best agent evals will come from mining real agent failure traces. my view is that every failed trace contains a potential eval but not in its raw form. raw traces are messy, long and too specific. the research problem is to distill them into clean reproducible tests. the pipeline i’m interested in is (which i'm currently working on): failure trace → failure attribution → earliest divergence point → minimal reproducible state → targeted eval → regression suite this turns trace data from passive observability into an active improvement loop. like can we extract the exact decision point where the agent should have behaved differently? and can we convert that into an eval that catches the same failure class in the future? i guess this matters because most agent failures are trajectory-level failures and not just output-level failures. personally i think this is much more realistic than relying only on hand-written benchmarks (imo they should look more like failure memory systems). hand-written evals encode what we think agents will fail on. traces encode what agents actually failed on. also once you have the mechanism, you can mutate the trace into variants. that is basically fuzzing for agents.

English

296

53.7K

Dylan Murphy@dylanmurphy829·1d

"Your credential should be earned by being right" Having claude grill me on a new idea and we're getting somewhere LOL

English

Dylan Murphy@dylanmurphy829·1d

@rileybrown

QME

Riley Brown@rileybrown·2d

Computer use with Codex is cool. The animations are cool. It’s fun to watch. But… is there a possibility that sometime soon the computer/browser use is too fast for us to even watch and comprehend what’s going on?

English

260

17.4K

Dylan Murphy@dylanmurphy829·1d

Me watching the token incineration as I test the new workflows feature with Opus 4.8

English

Dylan Murphy@dylanmurphy829·2d

If the frontier labs continue to make substantial gains in model performance (especially against real world long horizon agentic tasks), and then also continue to use their own models too steadily improve the surrounding harness and product. Does that change the calculus at all for you? Or is this something that you’re betting just simply won’t happen? Or even if it does happen and the frontier labs maintain a significant capability advantage that won’t have any impact on the negative economic scenarios that you outlined?

English

957

Gary Marcus, MIT PhD and NYU Professor Emeritus@GaryMarcus·2d

Hot take on what comes next, after the sudden decline of tokenmaxxing: - OpenAI will struggle - with the decline of tokenmaxxing Anthropic will struggle (aside from this quarter) to make a profit - Google will catch up to Anthropic - some Chinese companies might, too - LLMs will become commodities; margins will be very very thin - Most of the companies that invested massively in them will struggle to make back their investments - SpaceX’s AI efforts will flail - Nvidia will eventually decline, once all of the above becomes widely recognized.

English

178

215

1.9K

418.3K

Dylan Murphy@dylanmurphy829·2d

@atomical_on_git Seems better now. This was in a longer running session that had compacted previously but context was still showing under 50%

English

Adam Hallett@atomical_on_git·2d

@dylanmurphy829 Insanely slow. I see a few people complaining. My context window is pretty full. I wonder if it could be that.

English

Dylan Murphy@dylanmurphy829·2d

Is Codex exceedingly slow today (even on Fast) or is it just me?

English

Dylan Murphy ретвитнул

Kiko Dontchev@TurkeyBeaver·2d

Couldn’t agree more with Shana. Thinking about the entire team at Blue. We’ve been there before and there are very few things worse than losing a vehicle on the pad. Remember @blueorigin, it’s the darkest before the dawn and you will be measured not by this anomaly, but by how you respond. We are all rooting for you to get safely back to flight as soon as possible!

Shana Diez@ShanaDiez

Very sad to see Blue Origin’s static fire anomaly tonight. I know that gut wrenching feeling. Keep your heads up, I know they have a team that will come back from this with hard fought lessons learned. Rockets are hard.

English

184

2.7K

111K

Dylan Murphy ретвитнул

NASA Administrator Jared Isaacman@NASAAdmin·2d

NASA is aware of the anomaly that occurred tonight at Launch Complex 36 involving Blue Origin’s New Glenn rocket at Cape Canaveral Space Force Station.   Spaceflight is unforgiving, and developing new heavy-lift launch capability is extraordinarily difficult. We will work with our partners to support a thorough investigation of this anomaly, assess near-term mission impacts, and get back to launching rockets.  We will provide information on any impacts to the Artemis and Moon Base programs as it becomes available.

English

628

1.8K

19.1K

957.7K

Открыть

@CathPoaster @Chrisgpt @bgurley @johnennis @pmarca @elves_skill @emollick @ajambrosino