friendliai

350 posts

friendliai

@friendliai

The Frontier AI Inference Cloud. Deploy frontier open-weight models with unmatched efficiency—maximizing tokens and margins.

San Francisco, CA Katılım Nisan 2022

32 Takip Edilen267 Takipçiler

Sabitlenmiş Tweet

friendliai@friendliai·5 Şub

What if you got up to $50,000 in inference credit—and ended up with a faster, more reliable model endpoint? 🧵👇 Today, we’re launching the FriendliAI "Switch" promotion. We’re so confident that Friendli’s ‘Orca’ Engine and frontier open models will outperform your current stack that we’re paying you to make the move.

English

260

friendliai@friendliai·2d

🎮 At NVIDIA GTC: Win a Nintendo Switch 2 — and get up to $50K inference credit — just for switching to FriendliAI. Scaling LLMs in production usually means hitting a wall of: 💸 Rising costs 🧱 Throughput ceilings 📈 Unpredictable latency Whether you're coming from: → OpenAI / Anthropic APIs → Together / Fireworks Dedicated Endpoints → Self-hosted vLLM on AWS/GCP …we’ve made switching seamless. Why make the switch? Friendli’s Orca Engine delivers: 🚀 3x throughput vs. vLLM 🛡️ 99.99% reliability under real traffic 💰 50–90% lower costs ⚡ And yes — it’s just 3 lines of code. Fully OpenAI-compatible. 📍 Meet us at NVIDIA GTC — Booth #3305 We’ll help you switch on the spot.

English

friendliai@friendliai·2d

Claude-compatible applications can run on FriendliAI without changing payloads or response parsing, making inference faster, more reliable, and simpler to manage. 📜 Learn more: friendli.ai/docs/openapi/d…

English

friendliai@friendliai·2d

Why skipping the compatibility layer matters: ⚡ Lower latency 🛠️ Full feature coverage: structured content, tools, streaming, etc. 🔁 Easy migration for existing Claude apps

English

friendliai@friendliai·2d

🛑 Stop rewriting your API just to run inference. FriendliAI now natively supports the Anthropic Messages API. 💬

English

friendliai@friendliai·2d

@openclaw + @friendliai : your personal AI assistant, running more economically on open-weight models. Our new blog shows how to integrate FriendliAI with OpenClaw to run powerful multi-model agent systems on high-performance open-weight models like GLM-5 and Qwen3. The result: sophisticated agent workflows without massive infrastructure bills. 🚀 Here’s what the setup unlocks: → Serverless inference on leading open-weight models → Cross-provider model fallback (so your agents never go silent) → Specialized agents for different tasks — fast vs. deep reasoning → Channel-based routing (yes, including Discord) No GPU management. No runaway costs from proprietary model calls. Just clean, scalable agent orchestration on open models. 🟢 Headed to @nvidia GTC this week? Come find us at Booth #3305 — we’ll be running live vibe-coding demos and building real OpenClaw agent workflows on FriendliAI in real time. It’s the kind of thing you have to see to believe. Read the full blog 👇 friendli.ai/blog/integrati… #nvidiagtc #OpenClaw

English

friendliai@friendliai·4d

Attending 𝗚𝗧𝗖? Visit 𝗕𝗼𝗼𝘁𝗵 𝟯𝟯𝟬𝟱 tomorrow at 𝟯𝗽𝗺 𝗣𝗧 to discover '𝘩𝘰𝘸 𝘎𝘓𝘔 𝘨𝘰𝘵 𝘩𝘦𝘳𝘦' from Yu Jin, Head of Developer Ecosystem at Z.ai Join us to learn more about the most advanced open-weight model. Each day, our team will also be demonstrating how to... 🔹𝘖𝘱𝘵𝘪𝘮𝘪𝘻𝘦 𝘈𝘐 𝘞𝘰𝘳𝘬𝘭𝘰𝘢𝘥𝘴 at 𝟭𝗽𝗺 🔹𝘋𝘦𝘴𝘪𝘨𝘯 𝘔𝘰𝘥𝘦𝘭 𝘉𝘦𝘯𝘤𝘩𝘮𝘢𝘳𝘬𝘴 at 𝟮𝗽𝗺 FriendliAI consistently delivers the 𝘭𝘰𝘸𝘦𝘴𝘵 𝘭𝘢𝘵𝘦𝘯𝘤𝘺 for 𝗚𝗟𝗠-𝟱. Try it now on our 𝗦𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 or 𝗗𝗲𝗱𝗶𝗰𝗮𝘁𝗲𝗱 𝗘𝗻𝗱𝗽𝗼𝗶𝗻𝘁𝘀: friendli.ai/model/zai-org/…

English

friendliai@friendliai·5d

Open-source inference is efficient and interoperable, but managing deployments can be difficult. At 𝗚𝗧𝗖, we’re demonstrating how to 𝘰𝘱𝘵𝘪𝘮𝘪𝘻𝘦 𝘈𝘐 𝘸𝘰𝘳𝘬𝘭𝘰𝘢𝘥𝘴, 𝘣𝘦𝘯𝘤𝘩𝘮𝘢𝘳𝘬 𝘔𝘓 𝘮𝘰𝘥𝘦𝘭𝘴, 𝘷𝘪𝘣𝘦 𝘤𝘰𝘥𝘦 𝘸𝘪𝘵𝘩 𝘰𝘱𝘦𝘯-𝘸𝘦𝘪𝘨𝘩𝘵 𝘮𝘰𝘥𝘦𝘭𝘴, and 𝘴𝘤𝘢𝘭𝘦 𝘤𝘰𝘯𝘵𝘢𝘪𝘯𝘦𝘳𝘪𝘻𝘦𝘥 𝘪𝘯𝘧𝘦𝘳𝘦𝘯𝘤𝘦. Visit 𝗕𝗼𝗼𝘁𝗵 𝟯𝟯𝟬𝟱 every day for 15 min technical deep dives on the hour, starting with: 🟢 𝟭𝗣𝗠 | 𝗔𝗜 𝗪𝗼𝗿𝗸𝗹𝗼𝗮𝗱 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Implement continuous batching and online quantization to enhance performance. 🟢 𝟮𝗣𝗠 | 𝗠𝗼𝗱𝗲𝗹 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝗶𝗻𝗴: Run high-speed benchmarks using the Friendli Suite. 🟢 𝟯𝗣𝗠 | 𝗩𝗶𝗯𝗲 𝗖𝗼𝗱𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗢𝗽𝗲𝗻-𝗪𝗲𝗶𝗴𝗵𝘁 𝗺𝗼𝗱𝗲𝗹𝘀: Use coding agents like Claude Code and @kilocode with models like GLM running on FriendliAI. 🟢 𝟰𝗣𝗠 | 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗔𝗪𝗦: Run Containerized Inference with FriendliAI and AWS EKS Want to learn more? Schedule a dedicated 15 min demo with our technical team find out how you can deliver 𝟮𝗫-𝟯𝗫 throughput while reducing GPU costs by 𝟱𝟬-𝟵𝟬%: 44419902.hs-sites.com/gtc-book-a-mee…

English

304

friendliai@friendliai·6d

Your coding agent is only as fast as its model API. ⚡️ Stop waiting on "thinking..." pauses and prefill stutters. Whether it's Claude Code, @kilocode, or @opencode —the bottleneck is the infra, not the agent. Supercharge your workflow with Friendli Serverless APIs: 0️⃣ Zero Prefill Stutter (Context reuse) 🛠️ Stable Tool Calling (No more retries) ⚙️ Optimized GLM-5 & MiniMax-M2.5 Stop waiting. Start vibing. 💻 Full guide here: 👉 friendli.ai/blog/coding-ag…

English

friendliai@friendliai·12 Mar

Introducing Friendli InferenceSense™: the "AdSense for GPUs." 🏭💸 InferenceSense helps GPU cloud operators automatically fill idle compute cycles with paid AI inference workloads, just as AdSense helps digital publishers fill empty website space with ads to generate revenue. 🪙💰 Modern data centers are often portrayed as AI factories. Yet, most GPU clouds are still missing the crucial inference "assembly line" that produces intelligence—turning raw compute into generated tokens and revenue. When bursty training jobs finish, expensive hardware simply goes dark—but the massive costs of power, cooling, and depreciation never stop. Today, we are thrilled to officially launch the industry’s first inference monetization platform purpose-built to fix this: Friendli InferenceSense™. Powered by the highly optimized engine built by the inventors of continuous batching, InferenceSense automatically detects idle GPU capacity in your fleet and instantly fills it with paid inference requests for popular open-weight models. We bring the global demand; you simply plug in and earn. "Most GPU operators still act like traditional landlords, watching revenue evaporate every time a workload finishes or a contract ends," says FriendliAI CEO, @bgchun. "InferenceSense provides the missing assembly line. Every idle GPU-hour becomes a chance to serve real AI demand and capture token revenue. The AI factory build-out only makes sense when it actually makes cents." Why GPU clouds choose InferenceSense: 📈 Monetize Underutilized Infrastructure: Stop losing margin on dark hardware and transform idle compute cycles into an active, revenue-generating asset that can even surpass traditional rental revenue. 🔒 Zero Disruption: Your jobs ALWAYS come first. Immediate preemption guarantees zero downtime for your core workloads. ⚙️ Frictionless Integration: You retain full control over participating nodes and schedules, with no upfront costs or minimum commitments. Heading to NVIDIA GTC? We are currently accepting applications from qualified GPU cloud operators. 📰 Read the full blog here friendli.ai/blog/inference… 📩 Contact partners@friendli.ai to schedule an executive briefing with us at GTC #InferenceSense #NVIDIAGTC

English

friendliai@friendliai·11 Mar

🟢 NVIDIA Nemotron 3 Super is live on day 0 at Friendli.ai Developers can now deploy NVIDIA’s hybrid MoE model with industry-leading compute efficiency and accuracy on FriendliAI with Dedicated Endpoints optimized for multi-agent applications and for specialized agentic AI systems. Key Features: → Breakthrough Efficiency: The model’s hybrid Mamba–Transformer architecture improves token generation efficiency, supporting faster reasoning and higher response quality. → Leading Accuracy: Nemotron 3 Super achieves top accuracy ratings across leading benchmarks, including GPQA Diamond, AIME 2025, LiveCodeBench, IFBench, and BFCL. → Optimized Reasoning: Features a "Thinking budget" to avoid overthinking and optimize for lower, predictable inference costs. → Massive Scale: 120B total parameters with 12B active parameters and a context length of up to 1M. Running mission-critical workloads like Nemotron 3 Super requires a foundation you can trust. FriendliAI ensures enterprise-grade reliability with a 99.99% uptime SLA and stable production performance. 👉 More about Nemotron 3 Super: friendli.ai/blog/nvidia-ne… 👉 Check out NVIDIA models: friendli.ai/model/nemotron 👉 Try it now: friendli.ai/suite/~/dedica…

English

friendliai@friendliai·10 Mar

T-6 Days until NVIDIA GTC! 🚀 @FriendliAI will be at Booth #3305. Our team of inference experts will be demonstrating how to… Optimize workloads & benchmark models 📈 Vibe-code apps with open-source models 🤖 Scale containerized inference with AWS EKS 🛅 We’re offering up to $50,000 in inference credits to help developers switch from their existing deployment to faster, more efficient inference solutions. And, in the spirit of switching…we’re raffling off a Nintendo Switch 2. So stop by our booth to learn more about FriendliAI, and enter for a chance to win And if you’d like to arrange a meeting with us in advance, please request one here 44419902.hs-sites.com/gtc-book-a-mee…

English

friendliai@friendliai·9 Mar

Try Qwen3.5-397B-A17B on FriendliAI today: 👉 Deploy on FriendliAI: friendli.ai/model/Qwen/Qwe…

English

friendliai@friendliai·9 Mar

Running frontier MoE models efficiently requires optimized inference. FriendliAI enables: 🌟 Continuous batching for higher GPU utilization & throughput 🌟 Ultra-low-latency for real-time applications 🌟 Memory-efficient execution optimized for MoE 🌟 99.99% uptime

English

friendliai@friendliai·9 Mar

Multimodal agents are now the default — meet Qwen3.5-397B-A17B. Qwen3.5-397B-A17B, the flagship model of the Qwen3.5 native vision-language series from @Alibaba_Qwen, delivers top-tier reasoning, coding, agent workflows, and multimodal understanding — all in one model.

English

Keşfet

@openclaw @nvidia @kilocode @opencode @bgchun @Alibaba_Qwen @elonmusk @BarackObama