InferX (@InferXai) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

InferX@InferXai·18 Mar

Sub-second Cold Starts for 32B Models | Live Demo & Technical Discussion youtube.com/live/oI_eg5x1I…

YouTube

English

0

1

5

1.5K

InferX@InferXai·2h

Why compete for resources at all? Get your own dedicated instance. No shared queues. No priority lanes. No competing for capacity. Just your model, on demand. Inferx.net

Fireworks AI@FireworksAI_HQ

Reliability shouldn't require reserving GPUs. Serverless 2.0 is live on Fireworks: one API, 3 serving paths. → Standard: elastic default → Priority: sheds last under congestion, pricing ~1.5x standard → Fast: >100+ tok/s on Kimi K2.6 and GLM 5.1 Get started: fireworks.ai/blog/serverles…

English

0

1

14

InferX@InferXai·9h

@NousResearch Hermes with InferX models , it literally never sleeps. 🫶🏽

mikey@mikeyssi

@sudoingX @InferXai I am in love.

English

0

1

92

InferX retweetledi

Prashanth (Manohar) Velidandi@PMV_InferX·1d

Gemma has been our top model for the past four weeks on InferX. Not only that, we started offering longer context, tool calling standard, with a dedicated instance at $10/month. It’s been wild. InferX.net

LangChain@LangChain

The latest finding in the LangSmith Signal: Open Models are having a moment. 1 in 3 AI teams ran an open-weights model in April 2026, up from 1 in 5 nine months ago. The overall number of teams using open weights grew 3x. We’re seeing newer users choose open models at a higher rate than those who came before.

English

1

148

InferX@InferXai·1d

Our goal at InferX is to give developers access to every open model with a dedicated instance behind it. No shared compute. No queuing. No throttling. Starting at $10/month on H100. inferx.net

LangChain@LangChain

The latest finding in the LangSmith Signal: Open Models are having a moment. 1 in 3 AI teams ran an open-weights model in April 2026, up from 1 in 5 nine months ago. The overall number of teams using open weights grew 3x. We’re seeing newer users choose open models at a higher rate than those who came before.

English

0

1

41

InferX@InferXai·1d

Tired of your coding agent getting throttled mid-task? We just dropped to $10/month. Dedicated H100. Your instance. Nobody else’s. OpenCode + InferX = no interruptions. Ever. inferx.net

English

0

1

2

37

InferX@InferXai·2d

@NVIDIAAI Cold starts have been our obsession since day one. Sub-second prompt → first token. In production. You can see how we achieve sub-sec cold starts here. inferx.net

English

0

1

106

NVIDIA AI@NVIDIAAI·3d

Introducing Dynamo Snapshot, our approach for fast startup for inference workloads on Kubernetes, which reduces startup time from minutes to under 5 seconds. In production inference deployments demand fluctuates over time. Cold-starting inference workloads can take minutes, leaving idle GPUs that generate no tokens and serve no requests. Snapshot leverages GMS to enable concurrent weight restoration over a high-speed interconnect, while using Linux native AIO and parallel memfd restoration to accelerate CRIU restore performance.

English

22

54

356

60K

InferX@InferXai·2d

Cold starts have been our obsession since day one. Sub-second prompt → first token. In production. inferx.net

NVIDIA AI@NVIDIAAI

Introducing Dynamo Snapshot, our approach for fast startup for inference workloads on Kubernetes, which reduces startup time from minutes to under 5 seconds. In production inference deployments demand fluctuates over time. Cold-starting inference workloads can take minutes, leaving idle GPUs that generate no tokens and serve no requests. Snapshot leverages GMS to enable concurrent weight restoration over a high-speed interconnect, while using Linux native AIO and parallel memfd restoration to accelerate CRIU restore performance.

English

0

1

2

100

InferX@InferXai·4d

If you’re building with OpenClaw, model quality alone isn’t enough. You need: • strong tool calling • long context • reliable latency • privacy you can trust Why does long context matter? Because serious agents don’t run for one prompt. They maintain state, call tools, accumulate context, and execute over extended workflows. Short context means brittle agents, constant resets, and endless tweaking. At InferX, our Sovereign Endpoints are built for this. Long context by default. Tool-calling capable models. Dedicated instances for predictable performance. Your data stays isolated. No compromise. Agent infrastructure should feel production-ready, not experimental. inferx.net

English

0

32

InferX@InferXai·5d

SSH’d into the machine from their phone just to keep using InferX. We’ll take it. 🤙 @mikeyssi inferx.net

English

0

1

112

InferX retweetledi

mikey@mikeyssi·6d

Dope setup w/@InferXai

English

1

3

4

156

InferX@InferXai·23 May

inferx.net

ZXX

0

1

59

InferX@InferXai·22 May

It’s pronounced “InferX” not “InferX”

Nous Research@NousResearch

It’s pronounced “Hermes”, not “Hermes”

English

0

53

InferX@InferXai·22 May

This is one source of underutilization. For on-demand inference, cold starts + idle capacity are often even worse. The GPU can’t be busy if the model is still loading. Inferx.net

Anyscale@anyscalecompute

GPU utilization can drop below 50% in batch AI pipelines. Not because the model is slow, but because pipeline can’t feed GPUs fast enough. Learn how a unified CPU+GPU pipeline changes that. na2.hubs.ly/H05Dffy0

English

0

3

70

InferX@InferXai·22 May

@anyscalecompute GPU starvation inside the pipeline is real. But for bursty inference workloads, the bigger utilization killer is often before inference even starts: cold starts, idle reserved capacity, and always-on instances waiting for traffic. Different bottleneck. Same wasted GPUs.

English

0

14

Anyscale@anyscalecompute·20 May

GPU utilization can drop below 50% in batch AI pipelines. Not because the model is slow, but because pipeline can’t feed GPUs fast enough. Learn how a unified CPU+GPU pipeline changes that. na2.hubs.ly/H05Dffy0

English

1

3

11

741

InferX retweetledi

Prashanth (Manohar) Velidandi@PMV_InferX·21 May

Big congratulations to the @modal team on an incredible milestone. $355M raised and a bold vision for what AI infrastructure should look like. This line stood out: “Improve cold starts by 100x with GPU snapshotting — an outcome that seemed impossible.” It’s exciting to see Cold Starts enter the mainstream infrastructure conversation. Very few teams in the world have gone this deep on the problem. We’re one of them. At @InferXai , we’ve been building at the inference runtime layer since day one. The result: 1s cold start to first token on Qwen 35B , measured end-to-end, in production today. They called it impossible. We shipped it. Two teams. Two architectural paths. The same recognition that traditional cloud was never built for AI workloads. The efficient inference layer is being rebuilt in real time. We’re already here. Inferx.net.

English

2

3

15

1.3K

InferX@InferXai·21 May

We quietly launched Sovereign Endpoints™ and 50+ teams are already building on it. Your model. Your instance. No compromises. → Dedicated private GPU — no shared compute → Sub-second cold starts → Long context — built for coding and agents → Works with OpenCode, Dify, Continue, OpenWebUI, OpenClaw → Scale to zero — pay nothing when idle $20/month per model while we’re in beta. Try it → inferx.net

GIF

English

0

1

58

InferX@InferXai·19 May

Agents running inside sandboxes is the right direction. The next question developers will ask: which model is running those agents? At InferX that answer is always yours to make. Any model. Dedicated private instance. No lock-in. Sovereign inference for the agentic era. inferx.net

Claude@claudeai

Live from Code with Claude London: we're launching self-hosted sandboxes (public beta) and MCP tunnels (research preview) in Claude Managed Agents. Run agents inside your own perimeter, with your security controls applied by default.

English

0

50

InferX@InferXai·19 May

@claudeai Self-hosted Claude agents are a step in the right direction. Sovereign inference means running any model , not just Claude , inside your perimeter. Your model. Your data. Your infrastructure. inferx.net

English

0

722

Claude@claudeai·19 May

Live from Code with Claude London: we're launching self-hosted sandboxes (public beta) and MCP tunnels (research preview) in Claude Managed Agents. Run agents inside your own perimeter, with your security controls applied by default.

English

400

631

7.6K

2.3M

InferX

Keşfet