Martin Alderson

2.1K posts

Martin Alderson banner
Martin Alderson

Martin Alderson

@martinald

Writing up my thoughts on the AI transformation

London Katılım Şubat 2007
888 Takip Edilen1.1K Takipçiler
Sabitlenmiş Tweet
Martin Alderson
Martin Alderson@martinald·
Finally got round to starting a blog, if you're interested in AI and software engineering I hope you enjoy it: martinalderson.com. And feel free to subscribe to my once a month max newsletter there too!
English
3
3
46
5K
Martin Alderson
Martin Alderson@martinald·
me (mar 7): "I think we'll see 'double usage limits' overnight" anthropic: "From March 13, 2026 through March 27, 2026, your five-hour usage is doubled during off-peak hours" good to see anthropic actioning my blog efficiently ;)
English
0
0
0
125
Martin Alderson
Martin Alderson@martinald·
Yes I get that, but why can't agents build those features? Also a lot of the complexity goes away when building for one org - eg only one auth system, certain workflow features, etc. I def can see orgs of the future having excellent purpose built eg product management software being a competitive advantage
English
0
0
1
193
Gergely Orosz
Gergely Orosz@GergelyOrosz·
A good part of building a product like JIRA is the nonfunctional and no viable stuff. The stuff devs using it don’t see but what makes it useful + sticky. Things like reporting, custom fields, data retention + SLA guarantees + disaster recovery, data governance, permissions etc etc etc
English
1
0
5
617
Martin Alderson
Martin Alderson@martinald·
@GergelyOrosz I'm curious though. How far would that few person dev team _with agents_ have got? Would it be subpar? I genuinely don't know the answer but it feels like you could get a lot lot further now than before. Esp for anything that needs a lot of "per customer" customisation?
English
1
0
0
679
Gergely Orosz
Gergely Orosz@GergelyOrosz·
The end of the Uber story: after 3 years of a few-person dev team building what was a subpar Slack: they moved to Slack. This is the thing with “rebuilding JIRA:” you don’t realize you forgot to build eg auditing, backups, disaster recovery etc until you actually needed them yesterday
English
9
10
227
37K
Martin Alderson
Martin Alderson@martinald·
@awnihannun Playing around I get qwen3.5 4b w/ 128k token context in just over 10GB w/ Q4 K quantisation, so you're right it would squeeze in there. But I'm not sure the inference speed would be usable at that length. It is genuinely amazing so much intelligence can fit in ~3GB tho!
English
2
0
2
241
Awni Hannun
Awni Hannun@awnihannun·
@martinald I haven’t done the calculation (would be very curious to see it). But I would wager you could make >100k context length work on a phone with 10GB with KV cache quantization.
English
1
0
1
245
Awni Hannun
Awni Hannun@awnihannun·
According to benchmarks Qwen3.5 4B is as good as GPT 4o. GPT 4o came out ~2 years ago (May 2024). Qwen 3.5 4B runs easily on modern mobile devices. So the gap between frontier intelligence in a datacenter and running a model of equal quality on your iPhone could be 2-3 years. (Probably closer to 3 assuming Qwen3.5 4B is more benchmaxxed than 4o) I don't expect the trend of increasing intelligence-per-watt to change. So in 2-3 years it's plausible we will be running GPT 5.x quality models on an iPhone. Pretty wild.
English
125
150
2K
197.9K
Martin Alderson retweetledi
nolen
nolen@itseieio·
made a hook that adds a bouncing dvd logo to claude code whenever it's thinking
English
311
1.1K
17.2K
925.6K
Martin Alderson
Martin Alderson@martinald·
@trq212 Can we chillax with the WSL2 limitation? It should work fine (if you install pulseaudio, it sets itself up correctly). probably should just check if wsl2 has audio devices rather than just saying it's not supported?
Martin Alderson tweet media
English
0
0
0
43
Thariq
Thariq@trq212·
Voice mode doesn’t cost extra to use, and tokens for voice transcription don’t count against your rate limits. Available on Pro, Max, Team, and Enterprise on a rolling basis.
English
44
17
1.2K
83.3K
Thariq
Thariq@trq212·
Voice mode is rolling out now in Claude Code. It’s live for ~5% of users today, and will be ramping through the coming weeks. You'll see a note on the welcome screen once you have access. /voice to toggle it on!
English
1.1K
1.4K
17.3K
3.5M
Martin Alderson
Martin Alderson@martinald·
Don't overlook how important the harness is too. The fastest token is one you don't need to infer at all. Think this is probably the most underdeveloped place to optimise. Eg Claude Code uses Haiku to parse bash output and detect if a file exists that it's looking for, so the agent can preload the file contents. This saves an entire turn on opus. Plus obviously all the prompt caching stuff. But I suspect there is so much more that can be done on the agent side to save tokens.
English
0
0
1
924
Awni Hannun
Awni Hannun@awnihannun·
Inference compute is on track to be a massive computational workload by the end of this decade. I think it will be much bigger than training (especially if you consider RL rollouts / inference needs for training). And it's still an open playing field in terms of the hardware, the platforms, and the models. It's also increasingly clear that people are willing to pay a premium for reduced latency. On the hardware side there are several interesting directions to keep an eye on: - SRAM style setups seem promising (GPT Spark on Cerebras, Groq acquisition by Nvidia) - Disaggregated systems (prefill on one machine / processor, generation on a different one) probably make a lot of sense. The computational characteristics of prefill vs decode are so different, specializing at the hardware level will yield efficiency gains - I also wouldn't discount more exotic technology like the Taalas chip / near memory computing / etc. While they are still pretty far out from large scale deployment, the economic pressure for efficiency gains could be a catalyst On the algorithm / architecture side: - Pretty much every major open-weights model has at least one optimization which makes it faster for inference. Whether it be MoE, SSM (or other hybrid variety), or sliding window or sparse attention. There are more differences here than there were a year ago. And it will be interesting to see where we converge. - Will diffusion models unify the prefill / decode split? - Still believe there are big gains to be had in further co-design of model to hardware and workload I also don't think we will have a one-size fits all solution in the future: - Cloud-based models may look very different than edge-optimized models - Models may be more and more co-designed for the hardware they are deployed on - There will be at least one knob which trades-off latency and power efficiency / cost.
English
42
75
829
127.6K
Martin Alderson
Martin Alderson@martinald·
@nityeshaga No this has been around for a while. I think they removed it and re added it at some point
English
0
0
1
194
Nityesh
Nityesh@nityeshaga·
Just noticed something new in Claude Code's /compact command. Now in the compacted summary, the summary explicitly provides location to the full chat JSON file and tells Claude to reference it if it wants more detail. I've been looking at the compacted summaries since always, but this is the first time I'm seeing this prompt. I suspect that this is why compacting seems to be working more flawlessly recently within Claude Code.
Nityesh tweet media
English
9
8
212
15.1K
Martin Alderson
Martin Alderson@martinald·
@DnuLkjkjh yep, makes sense and i suspect it is why apple is having so many problems with their "siri" replacement
English
1
0
1
23
dnu
dnu@DnuLkjkjh·
spot on. the gap between 'model runs on device' and 'agent works reliably on device' is enormous. tool calling, memory management, error recovery — all of that needs to work without a round trip to a server. I've been building on-device voice processing and even for a narrow domain like transcription, the edge cases multiply fast when you can't fall back to cloud
English
1
0
1
21
Martin Alderson retweetledi
Tom Blomfield
Tom Blomfield@t_blom·
The entire Accenture workforce is about to be outperformed by a 24-year-old who learned Claude Code last Tuesday.
English
358
333
6K
1.2M