Clarifai

10.4K posts

Clarifai banner
Clarifai

Clarifai

@clarifai

Create and Control Your AI Workloads On Any Compute.

Washington, DC Katılım Mart 2014
2.1K Takip Edilen10.8K Takipçiler
Sabitlenmiş Tweet
Clarifai
Clarifai@clarifai·
Heading to NVIDIA GTC 2026? As GenAI moves into production, two challenges keep coming up: GPU scarcity and rising inference costs. Scaling AI is no longer just about choosing the right model. It’s about how efficiently you use the compute behind it. At GTC, Clarifai is joining @Vultr for a Theater Session to discuss how enterprises are improving throughput, controlling inference spend, and making better use of GPUs across cloud and hybrid environments. Our VP of Strategy, Sajai Krishnan, will share what teams are learning as they move from pilots to real production workloads. March 17 | 4:00 PM Booth #1631 If you’re building or operating AI systems at scale, join the conversation: clarifai.com/clarifai-at-gt… #GTC
Clarifai tweet media
English
0
2
3
410
Clarifai
Clarifai@clarifai·
Day 3 for GTC is here and there is a new pole sitter for Qwen3.5 performance. Clarifai Reasoning Engine approaches 300 tokens per second - 245% faster than a vanilla install.
Clarifai tweet mediaClarifai tweet mediaClarifai tweet media
English
2
2
2
167
Clarifai
Clarifai@clarifai·
Day 2 at GTC. 🚀 Clarifai Reasoning Engine hit 410 tokens per second on Kimi K2.5, making us one of the first providers to break the 400 TPS barrier on a trillion-parameter reasoning model. We're also running strong benchmarks on MiniMax M2.5, Qwen3.5, and GPT-OSS-120B. Production-ready performance for reasoning workloads. ⚡
Clarifai tweet media
English
1
0
3
221
Clarifai
Clarifai@clarifai·
Jensen breaks down the vital importance of inference architecture in this new economy in this year's GTC keynote. "Every. Single. Company. Will be thinking about their token factory effectiveness."
English
0
0
1
315
Clarifai
Clarifai@clarifai·
👀 What company broke the 400 token per second barrier on @Kimi_Moonshot K2.5 first? Check out Clarifai in Jensen's @nvidia GTC keynote.
Clarifai tweet media
English
1
1
7
559
Clarifai
Clarifai@clarifai·
Clarifai 12.2: Three-Command CLI Workflow for Model Deployment! 🚀 Model deployment shouldn't require juggling multiple tools and configuration files. With Clarifai 12.2, you can go from development to production in three CLI commands: → model init – Scaffold with automatic GPU selection → model serve – Test locally with production parity → model deploy – Deploy with automatic infrastructure provisioning The CLI handles dependency management, GPU selection, and deployment orchestration automatically. Multi-cloud instance discovery works across AWS, Azure, DigitalOcean, and other providers. Also new in 12.2: • Training on Pipelines (Public Preview) • Video Intelligence for real-time stream analysis • Dynamic nodepool routing and deployment enhancements
Clarifai tweet media
English
1
1
3
162
Clarifai
Clarifai@clarifai·
Most MCP servers live in local dev environments. But real applications need them as stable endpoints. MCP servers are how LLMs connect to tools and external systems. Web search. GitHub. Databases. Internal APIs. They expose capabilities that models can call through structured tool definitions. The problem is that most MCP servers run as simple stdio processes. That works for experiments, but not for production systems where agents, applications, and teams need reliable access. The missing step is turning those servers into accessible API endpoints. In this guide we walk through how to deploy an MCP server so its tools can be discovered and invoked by any LLM that supports function calling. No changes to the server itself. Just package it, deploy it, and expose the tools through a stable endpoint. Once deployed, your MCP server becomes shared infrastructure for agents and applications. Full walkthrough here: clarifai.com/blog/how-to-de…
Clarifai tweet media
English
0
3
5
187
Clarifai
Clarifai@clarifai·
Choosing an open-source LLM for production is harder than it looks. Benchmarks are helpful. But they don’t tell you how a model will behave under your workload, your latency targets, or your cost constraints. Before comparing models, ask: • What exact task are we solving? • Where will this run — single GPU, multi-node, real-time, batch? • What are our non-negotiables — licensing, privacy, predictable cost? Different workloads require different trade-offs. Multi-step reasoning, structured outputs, long context, code generation — they all stress models differently. And evaluation should reflect production reality, not leaderboard scores. Model choice is only half the equation. How you deploy, scale, and optimize that model determines whether it actually works in production. We broke this down in detail here: clarifai.com/blog/how-to-ch…
Clarifai tweet media
English
0
0
4
124
Clarifai
Clarifai@clarifai·
Clarifai will be at NVIDIA GTC 2026. As GenAI moves from pilot to production, two pressures are hitting hard: Rising inference costs. GPU scarcity across environments. Scaling AI is not about bigger models alone. It’s about orchestration and making better use of the compute you already have. At GTC, we’re joining @Vultr for a live Theater Session to discuss what this looks like in practice. Our VP of Strategy, Sajai Krishnan, will share how enterprises are improving throughput, lowering compute spend, and gaining greater control over production AI systems without locking into a single accelerator ecosystem. Tuesday, March 17 | 4:00 PM Booth #1631 If you’re building or operating AI at scale, join us: clarifai.com/clarifai-at-gt… #GTC2026
Clarifai tweet media
English
0
0
2
100
Clarifai
Clarifai@clarifai·
MCP servers are easy to run locally. But running them reliably as shared infrastructure is a different problem. We wrote a guide on how to deploy any stdio-based MCP server as a managed endpoint on Clarifai. Just define the MCP command, deploy it, and access its tools through a stable endpoint from any LLM that supports tool calling. We used the DuckDuckGo MCP server as an example, but this works for any MCP server. Full guide: clarifai.com/blog/how-to-de…
Clarifai tweet media
English
0
0
2
97
Clarifai
Clarifai@clarifai·
Everyone is building agentic AI. Very few are running it in production. The model performs. The demo impresses. Then reality shows up. Agents do not just generate text. They plan across steps. Call tools. Hit APIs. Maintain state. Run workloads that may take minutes or hours. And when they fail, they fail mid-process. That is not a model problem. That is a systems problem. Production agentic AI demands infrastructure that can: • Deploy agents reliably across environments • Manage tool and API dependencies centrally • Persist and recover state across long-running workflows • Scale across cloud, on-prem, or hybrid without lock-in This is exactly what Clarifai's Compute Orchestration was built for. Deploy any model. On any compute. At any scale. With autoscaling, multi-environment support, and centralized control. And when paired with the Clarifai Reasoning Engine, teams are seeing up to 2x performance at half the cost, because optimization does not stop at the model layer. If you’re serious about taking agentic AI beyond prototypes, this is where the conversation shifts. Explore how we’re building production-ready agentic AI: clarifai.com/blog/clarifai-…
Clarifai tweet media
English
1
1
2
146
Clarifai
Clarifai@clarifai·
Long-running AI workflows produce more than predictions. They produce state. Model checkpoints. Training logs. Evaluation metrics. Preprocessed datasets. Configuration files. With 12.1, we’re introducing Artifacts, versioned storage designed specifically for pipeline and workload outputs. Each artifact: • Supports immutable versions • Stores large files efficiently in object storage • Tracks metadata in the control plane for fast lookup • Works seamlessly with Pipelines, CLI, and SDK Why this matters: Reproducibility. Save the exact weights and configs behind a result. Checkpointing. Resume long-running jobs without recomputing. Version control. Compare outputs across runs with precision. As AI systems move beyond single requests into long-running, multi-step workflows, managing outputs becomes part of the infrastructure. Artifacts make that explicit. Learn more here: #title_2" target="_blank" rel="nofollow noopener">clarifai.com/blog/clarifai-…
Clarifai tweet media
English
0
0
1
117
Clarifai
Clarifai@clarifai·
Agentic AI is changing the cost and performance equation of building AI systems. It’s no longer about a single model or a single prompt. Production systems today involve long-running workflows, agentic coordination, and sustained inference under real cost constraints. In a recent conversation on Liftoff with Keith, Alfredo Ramos our CPTO, shared how Clarifai approaches this shift — from model optimization and token economics to hardware selection and orchestration at scale. The discussion covers: • Why open-weight models now support a significant share of real-world use cases • How model optimization and infrastructure choices impact production performance • Where orchestration becomes critical as systems grow more agentic • What it takes to accelerate deployment while reducing cost For teams moving from experimentation to production AI, these tradeoffs matter. Watch the full conversation here: youtu.be/oDXiID_Ehxk?si…
YouTube video
YouTube
Clarifai tweet media
English
0
0
2
118
Clarifai
Clarifai@clarifai·
Production agentic AI requires more than powerful models! It requires infrastructure that can deploy agents reliably, manage the tools they depend on, and scale across any environment. Clarifai 12.1 extends the core capabilities for agentic workloads: ✓ Deploy public MCP servers to give models tool-calling capabilities ✓ Store and version pipeline outputs with Artifacts ✓ Monitor and control long-running workflows through improved Pipeline UI Available now in Public Preview. Read the full release here: clarifai.com/blog/clarifai-…
Clarifai tweet media
English
0
1
2
148
Clarifai
Clarifai@clarifai·
GPU shortages are the headline. But the deeper issue is structural. AI workloads in 2026 look very different from two years ago. They’re longer-running. More agentic. More iterative. They don’t just spike once and finish. They stay active, coordinate tools, call models repeatedly, and operate under real cost pressure. When workloads change, infrastructure stress shows up differently. In many cases, the constraint isn’t absolute GPU supply. It’s fragmentation. Idle capacity in one place, saturation in another. Static allocation for dynamic systems. That’s why the conversation is shifting from “How do we get more GPUs?” to “How do we use the ones we have more intelligently?” Orchestration, scheduling, and workload design matter more than ever. We wrote about where GPU pressure is actually coming from, and what teams should be thinking about as AI systems scale. Read here: clarifai.com/blog/gpu-short…
Clarifai tweet media
English
0
1
4
172