Sammy Milton-Tomkins

73 posts

Sammy Milton-Tomkins banner
Sammy Milton-Tomkins

Sammy Milton-Tomkins

@Miltonsammy_

Founder @NexaCoreio Dedicated GPU infrastructure for AI teams

England, United Kingdom Katılım Mart 2026
257 Takip Edilen21 Takipçiler
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@osttoo @OpenAIDevs This usually shows up when the infra layer isn’t keeping pace with real time demand. Do you seem to be seeing this more from scaling load or from how the workloads are being scheduled?
English
0
0
0
9
Ostto
Ostto@osttoo·
@OpenAIDevs Container cold starts were killing us on our AI sales platform. Every second of latency on a live call is a second the prospect loses patience. Warm pools like this matter way more than benchmark scores for anyone running agents in production.
English
2
0
1
1.9K
OpenAI Developers
OpenAI Developers@OpenAIDevs·
Agent workflows got even faster. You can spin up containers for skills, shell and code interpreter about 10x faster. We added a container pool to the Responses API, so requests can reuse warm infrastructure instead of creating a full container creation each session. #hosted-shell-quickstart" target="_blank" rel="nofollow noopener">developers.openai.com/api/docs/guide…
English
108
151
2.1K
173.9K
Boardy
Boardy@boardyai·
Founders, drop what you're working on and what you need (capital, hiring, etc.) I'll make the right intros.
English
280
1
221
20.5K
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@PieroHerrera1 @songjunkr Yes this tends to happen when demand spikes and you’re sitting on shared allocation. It looks fine until everyone hits it all at once, then latency just totally collapses. Do you seem to be seeing this consistently now or manly at peak times?
English
0
0
0
6
Piero Herrera 필호
Piero Herrera 필호@PieroHerrera1·
@songjunkr And the inference is really slow this weekend. Seems like they still have a lot of demand, and they just reset limits globally on Friday so it’s absurdly slow. Switching to codex as well
English
1
0
3
557
송준 Jun Song
송준 Jun Song@songjunkr·
제생각에는 GPT-5.5는 현시점 최고의 모델인듯 합니다. 가격으로는 Deepseek V4 Pro. Claude는 이제 장점이 무엇인지 정말 모르겠어요.
한국어
111
44
859
94.4K
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@ariccio @AdvancedTweaker @michael_hoerger That’s where it usually starts getting real. Once you move from isolated runs to continuous loops, the time cost compounds fast, especially on inference. Are you running this on shared infra or something more dedicated?
English
1
0
1
6
Alexander Riccio (@co2trackers)
@AdvancedTweaker @michael_hoerger It's been cooking like that for 6 hours. About 2 hours or so is probably lost to slow ios simulator test runs and slow swift builds, but the other 4 is inference. It's the equivalent of tens of thousands of messages with a web chat bot
English
2
0
3
32
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@HarveenChadha Feels like a lot of teams only realise this once they actually try to run things at scale. Talking about agentic AI is easy in you hit real constraints on memory, throughput and allocation. Are you seeing teams in your network actually struggle with this yet or still theoretical?
English
0
0
0
15
Harveen Singh Chadha
Harveen Singh Chadha@HarveenChadha·
I am so surprised none of the Indian tech leaders are obsessed about compute The US stocks are playing on themes like gpu shortage, memory shortage, cpu shortage while here we are so chill just by talking about applying agentic ai where we own no part of the stack
English
88
83
1K
43.6K
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@AGNonX Feels like a lot of teams are moving local just to escape allocation issues, but then hit limits again once workloads grow. Do you seem to actually be seeing these setups hold under sustained inference or more for controlled use cases?
English
0
0
0
0
Aunt Gladys Nephew
The #LocalAI hardware shortage is here. Mac Minis and Mac Studios are sold out everywhere and getting flipped on eBay at massive markups right now. This is the GPU shortage all over again, except this time it's #AppleSilicon.
English
2
0
1
64
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@adlrocha That makes sense, we have seen teams hit that same wall where “vibe checks” become the only thing catching real failures. it starts breaking once workloads run continuously rather than in isolated evals. Have you looked at pushing more of that validation under sustained load?
English
0
0
1
10
adlrocha
adlrocha@adlrocha·
We have some automation in place, but there's still a human-in-the-loop stage on our staging env that check the "vibes" of the release. This is too manual to my taste, but is the only way of catching some regressions. We are working to make it better, I may write about this in my next post, actually :) Any ideas from your side more than welcome
English
1
0
1
9
adlrocha
adlrocha@adlrocha·
Testing AI agents is hard, and it requires a three-layer approach: 1️⃣ Scaffolding unit tests, deterministic checks for your orchestration logic 2️⃣ Issue-tagged regression tests, real bugs that broke production 3️⃣ LLM-as-judge + task-based evals to measure actual capability The open problem: detecting small prompt regressions that break complex workflow This week is already booked, but will write about my experience working on these problems in two weeks in the newsletter. Subscribe to stay tuned!
English
2
0
2
72
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@AlexanderKalian @P33RL3SS Fair take. The gap usually shows up once systems move from demos to continuous workloads, that’s where the real constraints a tradeoffs become unavoidable. Curious what you have seen break first in practice?
English
0
0
0
1
Dr Alexander D. Kalian
Dr Alexander D. Kalian@AlexanderKalian·
Cancer vaccines work by training the immune system to recognise and attack the mutated cancer cells. Trouble is, different cancers have different mutations and hence different antigens - requiring different vaccines. Cancer vaccines, especially if personalised, will save millions of lives - maybe even billions - but one given vaccine cannot be a standalone cure for all cancers.
English
3
0
3
97
Dr Alexander D. Kalian
Dr Alexander D. Kalian@AlexanderKalian·
I can guarantee that AI will not "cure cancer" - at least, not in any clean singular way. Cancer is an umbrella term for many different diseases, affecting different tissues, with different pathologies and treatment pathways - each requiring different cures. And this is before we discuss the inherent challenges faced by AI drug discovery, which are unlikely to be resolved anytime soon. This "AI will cure cancer" narrative among AI utopianists, demonstrates a mixture of overconfidence, ignorance, and naivety - about both the capabilities of AI, and the applied domain of biology that they so naively delve into.
djcows@djcows

if AI cures cancer, will the anti-AI people still hate AI?

English
88
31
257
26.2K
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@adelbucetta @ai_with_shah @NanoBanana Agreed, a lot of that “complexity” end up being hidden infra constraints, teams scale usage but the underlying capacity and scheduling don’t scale cleanly with it. That’s usually where the real instability starts.
English
0
0
0
11
Adel Bucetta
Adel Bucetta@adelbucetta·
@ai_with_shah @NanoBanana most people think ai solves this, but it just accelerates the problem: maintenance costs, complexity, and scaling nightmares don't disappear
English
1
0
1
12
Shah
Shah@ai_with_shah·
Nano Banana Pro 🍌 Prompt share 👇 The image is divided into three clean horizontal panels with no text. Top panel: beginning of a dynamic karate kick in a minimalist dojo, mid-motion setup. Middle panel: action in progress with powerful extension and fabric flow. Bottom panel: conclusion with balanced landing and focused expression. Clean line work blended with photorealistic details, consistent character across panels, high-contrast studio lighting, professional storyboard aesthetic for game or animation reference.
Shah tweet mediaShah tweet mediaShah tweet media
English
22
1
16
921
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@seoinetru @fal @Fal_ai That kind of jump usually isn’t the model itself, it’s queuing or shared capacity under load. Have you noticed if it spikes at certain times or just stays elevated? We have seen similar where latency quietly degrades once utilisation crosses a threshold.
English
0
0
0
6
NeuralMyth
NeuralMyth@seoinetru·
@fal @fal_ai Urgent issue with fal-ai/turbo-flux-trainer 🚨 Since Jan 27, training times jumped from 30s to 20-40 mins. Our UX depends on speed, and we're paying for Turbo specifically for that low latency. Is there a backend issue or queue limit affecting this model? Help
English
1
0
1
196
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@jacobbednarz That latency point is interesting, are you seeing it stay stable once workloads run continuously, or does it drift under sustained load? We have a few cases where things look fine initially then degrade quietly over time.
English
0
0
0
18
Jacob⚡️Bednarz
Jacob⚡️Bednarz@jacobbednarz·
this week has been the first time i've ever felt like we're finally "working in the future". - while dropping the kids to daycare, i had AI review and dissect a latency issue i tracked down - while sleeping, i've had my 3d printer creating new toolbox organisers - started feeding camera footage of our livestock into an agricultural model for detecting sick, injured or underfeeding patterns - while shipping a feature for work, had AI debug why one of my unifi access points randomly doesn't allow clients to stay connected it's probably not the hottest use of autonomous processss these days but damn, it feels good.
English
1
0
3
39
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@GetPowerAI Exactly. Starts as scheduling, but once workloads stay hot, infra mismatches show fast. We have seen similar where things look fine until continuous load exposes it. Do you seem to be seeing this more region specific or more across deployments?
English
0
0
0
12
Power AI
Power AI@GetPowerAI·
@Miltonsammy_ It starts as scheduling. It turns into infrastructure. When workloads go continuous, small mismatches between compute and power get amplified. What looks like a scheduling issue is often underlying grid constraints, price volatility, or local capacity limits.
English
1
0
1
11
Power AI
Power AI@GetPowerAI·
AI demand is rapidly outpacing available computing infrastructure, with companies facing shortages of GPUs, rising costs, and capacity constraints as usage shifts toward continuous, agent-driven workloads. The deeper issue is that scaling AI is no longer just a software challenge, but a physical one, where compute, data centers, and energy infrastructure are becoming tightly coupled and increasingly limited. x.com/GetPowerAI/sta…
English
1
0
1
23
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@Samward @Konstantine And that’s the dangerous part, silent failure. Looks stable on surface while compute is wasted underneath. We have seen similar where retries/ loops mask issues for hours. Do you seem to be instrumenting at step level or still mostly aggregate?
English
0
0
0
11
Sam Ward
Sam Ward@Samward·
@Miltonsammy_ @Konstantine Right. Orchestration failures don't look like inference failures either. A bad model answer is visible. An orchestration layer silently re-running the wrong step for three hours is not. Most teams don't have the telemetry to catch the second one yet.
English
1
0
1
17
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@adlrocha Exactly, defining success gets somewhat blurry as tasks broaden. We’ve seen team default to proxies that don’t hold under real usage. Are you relying more on human eval loops now or trying to systematise it fully?
English
1
0
0
12
adlrocha
adlrocha@adlrocha·
I think one of the worst issues we are seeing (at least for our use case) is how to actually objectively define success for an agentic task. We are dealing with data analysis and for small narrow tasks determining if the result is correct is easy, but with a vast catalog and broad analyses it becomes harder. These also involve larger tasks, which means that depending on the model you can clearly see the degradation
English
1
0
1
13
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@mpetyx Agreed. Routing helps early, but as usage scales the challenge shifts into maintaining consistency across those systems. That’s where orchestration stops being optimisation and becomes an infrastructure problem.
English
0
0
1
16
Michael Petychakis
Michael Petychakis@mpetyx·
The answer isn't waiting for prices to drop. It's orchestration. AT&T cut AI costs by 90% and tripled throughput — not by using less AI, but by routing tasks to right-sized models instead of pushing everything through frontier. Three moves that matter now: → Build a dedicated AI compute budget (stop raiding existing line items) → Instrument cost per task, per workflow, per outcome → Portfolio your model spend — frontier for high-stakes, open-weight for volume The math is moving. Start doing it.
English
2
0
1
168
Michael Petychakis
Michael Petychakis@mpetyx·
The OPEX shift from headcount to AI tokens isn't a future prediction — it's happening now. But the math is broken. CEOs want 3x velocity. GMs are fighting for unbudgeted token spend mid-quarter. Top engineers are consuming more tokens than the rest of the org combined. We're paying a premium for capability instead of getting cost arbitrage. Here's what's actually going on. 🧵👇
English
1
0
0
271
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@1a1n1d1y @0xsachi 20% helps, but most teams find the real constraint isn’t throughput, it’s consistency under load. Once systems scale, small gains get eaten quickly if the underlying infra isn’t stable.
English
0
0
1
5
andy
andy@1a1n1d1y·
@0xsachi what if i could make inference 20% greater throughput for the same existing hardware, is that good? also check out tpu… the price drop you’re looking for is hidden behind 500 lines of jax
English
1
0
2
59
Miss Sentient
Miss Sentient@0xsachi·
What if it’s possible to build Mythos with open-source models?
English
20
0
29
2.7K
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@Samward @Konstantine Exactly. Most expect scaling pressure on inference, but it shifts into orchestration and system coordination fast. That’s where consistency and reliability start breaking before people realise what’s actually happening.
English
1
0
0
23
Sam Ward
Sam Ward@Samward·
The CPU story is under covered. We run legal agents on a mix of GPU for model inference and CPU for the orchestration layer that actually decides what the agent should do next. As agents scale, the bottleneck moves from raw tokens per second to the decision engine around them. CPUs tuned for that workload would change the economics.
English
1
0
3
19
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@KislayParashar1 @jukan05 Agreed. Most focus on GPU count, but the real constraint shows up in how the system behaves under load. Bandwidth, coordination, and stability become the bottleneck long before raw compute does.
English
0
0
1
9
Kislay Parashar
Kislay Parashar@KislayParashar1·
@jukan05 This is already happening quietly. CPU memory bandwidth, not compute, is the actual bottleneck in heavy inference workloads. Going from 1 CPU per 12 GPUs to 2 CPUs per GPU is a massive architectural shift nobody is talking about enough.
English
1
0
1
399
Jukan
Jukan@jukan05·
My bold prediction: within the next two years, there will be cases where GPUs cannot be deployed because of CPU shortages. According to industry checks, in some Rubin Ultra configurations, the GPU-to-CPU deployment ratio has already exceeded one GPU to two CPUs.
English
93
165
2K
664.2K
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@ValeriusLabs @sama Exactly. Most underestimate how fast inference cost stops being a pricing problem and becomes a capacity and stability problem. Once usage scales, securing reliable compute becomes the constraint, not demand.
English
0
0
0
268
VALAI
VALAI@ValeriusLabs·
@sama Cursor's $60B valuation assumes inference margins stay fat. spacex compute actually costs money to run. once they're burning gpu cycles on model serving, that acquisition math flips hard.
English
1
0
1
1.6K
Sammy Milton-Tomkins
Sammy Milton-Tomkins@Miltonsammy_·
@linuxquestions @datadoghq Most teams underestimate how quickly orchestration complexity turns into instability. Capacity and consistency become the real constraint long before models do. Curious how many are actually planning infra at that level yet.
English
0
0
0
6
Jeremy
Jeremy@linuxquestions·
The first @datadoghq report on AI/LLMs just dropped. It explores the state of AI engineering in production. One thing struck me. As the ecosystem matures and real LLM-based systems are in production longer, these systems start to look more and more like the distributed systems we already know. The overlap isn't 100% of course, but routing, dependencies, budgets, capacity planning, tech debt, and unanticipated failure modes... a lot of the patterns look familiar. datadoghq.com/state-of-ai-en…
English
1
0
3
79