CloudRift

83 posts

CloudRift

@CloudRiftAI

The Operating System for Sovereign AI Deployments

Mountain View, CA Katılım Mart 2024

39 Takip Edilen76 Takipçiler

CloudRift@CloudRiftAI·12h

Most GPU VMs come configured for general workloads. Our team benchmarked what host-level tuning actually changes: memory bandwidth up to 7x on #H200, #NCCL up to +144% on PRO 6000. On the wrong config, #NUMA exposure cuts NCCL by 57%. cloudrift.ai/blog/benchmark…

English

CloudRift retweetledi

ElevenLabs@ElevenLabs·1d

Introducing Dubbing v2, our revolutionary new dubbing model. For the first time, the emotion and performance of the original content is carried over into every language.

English

188

1.8K

492.6K

CloudRift@CloudRiftAI·1d

Close to half of planned US data center builds this year are projected to be delayed or canceled. The cause is power infrastructure and China-sourced parts, with transformer lead times now up to five years. tomshardware.com/tech-industry/… #DataCenters #AIinfrastructure

English

CloudRift@CloudRiftAI·2d

61% of Western European CIOs now prioritize local cloud providers over US hyperscalers. With the EU AI Act fully applicable on August 2, regional GPU capacity is shifting from a preference to a procurement requirement. euronews.com/next/2026/03/0… #SovereignAI #EUAIAct

English

CloudRift retweetledi

dstack@dstackai·2d

Training models or serving inference on AMD GPUs? We’ve refreshed the AMD accelerator example in the dstack docs, covering on-prem fleets, cloud GPU provisioning, dev environments, training jobs, and production-grade inference. dstack.ai/docs/examples/…

English

1.4K

CloudRift@CloudRiftAI·2d

How do you search 24,000 matmul configurations without burning days of GPU time? @ditrifonov's autotuner samples around 207 of them in ~67 seconds with Monte Carlo tree search. Check out part 3 of the writeup: cloudrift.ai/blog/building-… @triton_lang #MLcompilers #CUDA

English

CloudRift@CloudRiftAI·3d

Check out Part 3 of @ditrifonov's series on building a GPU compiler from scratch: He added autotuning via Monte Carlo tree search, moving the geomean from 0.87x to 0.96x of PyTorch eager. 32 of 84 kernels now beat PyTorch's hand-tuned code. cloudrift.ai/blog/building-… #MLcompilers @PyTorch

English

CloudRift@CloudRiftAI·23 May

@AMD Instinct #MI350X in our benchmarks: 2.6x faster FP16 matmul throughput than H200. Memory bandwidth: 241 GB/s on default libvirt, 813 GB/s tuned. Full results in the post: cloudrift.ai/blog/benchmark… #AMDInstinct #ROCm

English

CloudRift@CloudRiftAI·22 May

If you've ever wished you could read PyTorch's compiler end to end, here's the closest thing: Dmitry built a working ML compiler in about 8,000 lines of Python that's faster than PyTorch eager on average and up to 4.7x faster on small kernels like reductions and k/v projections. cloudrift.ai/blog/building-… @PyTorch #MLcompilers #PyTorch

English

CloudRift@CloudRiftAI·22 May

NUMA exposure jumps GPU VM memory bandwidth by 3-7x. But on H200, cross-node NCCL collectives lost 57% of bandwidth when GPUs spanned different NUMA nodes. A real trade-off: cloudrift.ai/blog/benchmark… @nvidia #NCCL #NUMA #GPUcloud

English

CloudRift@CloudRiftAI·18 May

288 GB HBM3e per accelerator changes the #inference deployment math. Workloads that need 2x or 4x #H100 with tensor parallelism collapse onto a single #MI350X. Fewer failure modes, no cross-GPU latency. cloudrift.ai/mi350x @AMD #AMDinstinct

English

CloudRift@CloudRiftAI·16 May

#Llama 3 70B in FP16 weighs ~140 GB. A single @AMD #MI350X (288 GB HBM3e) fits it with room for KV cache and long context. On #H100 (80 GB), the same model requires tensor parallelism across two GPUs. cloudrift.ai/mi350x #amdinstinct

English

CloudRift@CloudRiftAI·15 May

Available now on CloudRift as on-demand VM rentals: $3.65/hr for an @AMD Instinct #MI350X. 288 GB VRAM, HBM3e, 8 TB/s, no minimum commitment. No waitlist. cloudrift.ai/mi350x #AMDInstinct #LLMinference #ROCm

English

CloudRift@CloudRiftAI·14 May

@ditrifonov 's ML compiler, benchmarked on a full transformer block at FP32, #RTX5090. Geomean 1.11x over @PyTorch eager and 1.20x over torch.compile. Small k/v projections reach 4.7x. Large matmuls at seq=512 regress where register pressure dominates. #GPU #CUDA #PyTorch #MLSys cloudrift.ai/blog/building-…

English

CloudRift@CloudRiftAI·14 May

kyln.bio, a CloudRift AI Grant recipient, trains models that generate ligands for drug discovery. They've since won an Ignite grant from @PavaCenter and started wet-lab work at @HopkinsMedicine to test the model's predictions. #AIDrugDiscovery #AIforScience

English

CloudRift@CloudRiftAI·13 May

Part 2 of @ditrifonov 's ML compiler series is up. It covers the lower half of the pipeline: Tile IR, Kernel IR, CUDA emission, and the sixteen rewrite rules that turn a @PyTorch graph into a competitive kernel. About 8,000 lines of Python now. #GPU #CUDA #PyTorch cloudrift.ai/blog/building-…

English

CloudRift@CloudRiftAI·12 May

Modern ML compilers all share the same shape: Torch IR → Tensor IR → Loop IR → Tile IR → Kernel IR → CUDA Each lowering moves closer to the hardware: decomposition → fusion → tiling → scheduling → codegen. @ditrifonov rebuilt the whole pipeline in 5K lines of Python to show why. cloudrift.ai/blog/building-… @modular @PyTorch

English

CloudRift@CloudRiftAI·12 May

@CatoDigitalInc redeploys GPU servers retired from Meta and NVIDIA fleets, rather than commissioning new ones. Their capacity is now on CloudRift as V100 32GB VMs at $0.29 per GPU/hour. Good for fine-tuning, batch inference, rendering, and HPC. → cloudrift.ai/gpu-rentals

English

CloudRift@CloudRiftAI·11 May

V100 32GB VMs are now on CloudRift at $0.29 per GPU/hour, supplied by @CatoDigitalInc. Fits a LoRA fine-tune of Llama 3 8B, Whisper Large inference, or a batch embeddings job on a single GPU. → cloudrift.ai/gpu-rentals @nvidia

English

151.9K

CloudRift@CloudRiftAI·8 May

$0.29 per GPU/hour for a V100 32GB VM on CloudRift. The same hardware on AWS and Azure runs above $3 per GPU/hour, and the 32GB variant is usually only sold in 8-GPU bundles. We offer it as a single-GPU VM. extremely useful if your job runs fine on Volta and does not need Hopper. Supplied by @CatoDigitalInc. → cloudrift.ai/gpu-rentals @huggingface @nvidia

English

Keşfet

@ditrifonov @PyTorch @AMD @nvidia @PavaCenter @HopkinsMedicine @modular @CatoDigitalInc