InferX

386 posts

InferX banner
InferX

InferX

@InferXai

Serverless GPU inference. Sub-Second cold starts — even for 32B(64GB)+ models. Scale to zero. No idle billing. https://t.co/GbD8l2JPy0

San Francisco Katılım Mart 2025
46 Takip Edilen221 Takipçiler
InferX retweetledi
InferX
InferX@InferXai·
Tired of your coding agent getting throttled mid-task? We just dropped to $10/month. Dedicated H100. Your instance. Nobody else’s. OpenCode + InferX = no interruptions. Ever. inferx.net
InferX tweet media
English
0
1
2
37
InferX
InferX@InferXai·
@NVIDIAAI Cold starts have been our obsession since day one. Sub-second prompt → first token. In production. You can see how we achieve sub-sec cold starts here. inferx.net
English
0
0
1
106
NVIDIA AI
NVIDIA AI@NVIDIAAI·
Introducing Dynamo Snapshot, our approach for fast startup for inference workloads on Kubernetes, which reduces startup time from minutes to under 5 seconds. In production inference deployments demand fluctuates over time. Cold-starting inference workloads can take minutes, leaving idle GPUs that generate no tokens and serve no requests. Snapshot leverages GMS to enable concurrent weight restoration over a high-speed interconnect, while using Linux native AIO and parallel memfd restoration to accelerate CRIU restore performance.
NVIDIA AI tweet media
English
22
54
356
60K
InferX
InferX@InferXai·
If you’re building with OpenClaw, model quality alone isn’t enough. You need: • strong tool calling • long context • reliable latency • privacy you can trust Why does long context matter? Because serious agents don’t run for one prompt. They maintain state, call tools, accumulate context, and execute over extended workflows. Short context means brittle agents, constant resets, and endless tweaking. At InferX, our Sovereign Endpoints are built for this. Long context by default. Tool-calling capable models. Dedicated instances for predictable performance. Your data stays isolated. No compromise. Agent infrastructure should feel production-ready, not experimental. inferx.net
InferX tweet media
English
0
0
0
32
InferX
InferX@InferXai·
SSH’d into the machine from their phone just to keep using InferX. We’ll take it. 🤙 @mikeyssi inferx.net
InferX tweet media
English
0
0
1
112
InferX retweetledi
mikey
mikey@mikeyssi·
Dope setup w/@InferXai
mikey tweet media
English
1
3
4
156
InferX
InferX@InferXai·
@anyscalecompute GPU starvation inside the pipeline is real. But for bursty inference workloads, the bigger utilization killer is often before inference even starts: cold starts, idle reserved capacity, and always-on instances waiting for traffic. Different bottleneck. Same wasted GPUs.
English
0
0
0
14
Anyscale
Anyscale@anyscalecompute·
GPU utilization can drop below 50% in batch AI pipelines. Not because the model is slow, but because pipeline can’t feed GPUs fast enough. Learn how a unified CPU+GPU pipeline changes that. na2.hubs.ly/H05Dffy0
Anyscale tweet media
English
1
3
11
741
InferX retweetledi
Prashanth (Manohar) Velidandi
Big congratulations to the @modal team on an incredible milestone. $355M raised and a bold vision for what AI infrastructure should look like. This line stood out: “Improve cold starts by 100x with GPU snapshotting — an outcome that seemed impossible.” It’s exciting to see Cold Starts enter the mainstream infrastructure conversation. Very few teams in the world have gone this deep on the problem. We’re one of them. At @InferXai , we’ve been building at the inference runtime layer since day one. The result: 1s cold start to first token on Qwen 35B , measured end-to-end, in production today. They called it impossible. We shipped it. Two teams. Two architectural paths. The same recognition that traditional cloud was never built for AI workloads. The efficient inference layer is being rebuilt in real time. We’re already here. Inferx.net.
English
2
3
15
1.3K
InferX
InferX@InferXai·
We quietly launched Sovereign Endpoints™ and 50+ teams are already building on it. Your model. Your instance. No compromises. → Dedicated private GPU — no shared compute → Sub-second cold starts → Long context — built for coding and agents → Works with OpenCode, Dify, Continue, OpenWebUI, OpenClaw → Scale to zero — pay nothing when idle $20/month per model while we’re in beta. Try it → inferx.net
GIF
English
0
0
1
58
InferX
InferX@InferXai·
Agents running inside sandboxes is the right direction. The next question developers will ask: which model is running those agents? At InferX that answer is always yours to make. Any model. Dedicated private instance. No lock-in. Sovereign inference for the agentic era. inferx.net
Claude@claudeai

Live from Code with Claude London: we're launching self-hosted sandboxes (public beta) and MCP tunnels (research preview) in Claude Managed Agents. Run agents inside your own perimeter, with your security controls applied by default.

English
0
0
0
50
InferX
InferX@InferXai·
@claudeai Self-hosted Claude agents are a step in the right direction. Sovereign inference means running any model , not just Claude , inside your perimeter. Your model. Your data. Your infrastructure. inferx.net
English
0
0
0
722
Claude
Claude@claudeai·
Live from Code with Claude London: we're launching self-hosted sandboxes (public beta) and MCP tunnels (research preview) in Claude Managed Agents. Run agents inside your own perimeter, with your security controls applied by default.
English
400
631
7.6K
2.3M