SkyPilot

341 posts

SkyPilot banner
SkyPilot

SkyPilot

@skypilot_org

Run, manage, and scale AI workloads on any AI infrastructure. Open-source system for all your AI compute — Kubernetes, Slurm, VMs, 20+ clouds.

Sky Katılım Ekim 2022
69 Takip Edilen5.3K Takipçiler
Sabitlenmiş Tweet
SkyPilot
SkyPilot@skypilot_org·
Shopify now runs all AI training on SkyPilot. • H200s on @nebiusai, L4s on @googlecloud - multi-cloud AI enabled by SkyPilot • One unified interface for engineers • Cost tracking, Inifiniband support, fair scheduling Excited to support @Shopify's journey to build the future of e-commerce with AI 💪🏻
SkyPilot tweet media
Shopify Engineering@ShopifyEng

Multi-cloud GPU scheduling without losing your mind. Inside our @skypilot_org setup: how we route GPU training workloads across providers, keep scheduling fair, and let engineers stay close to the metal instead of buried in cloud consoles.

English
1
3
12
1.2K
SkyPilot
SkyPilot@skypilot_org·
Missed the AI Infra Meetup with SkyPilot and @CoreWeave? Recordings are now live. • SkyPilot at 100k+ GPU Scale at Meta AI (FAIR) - Lucca Bertoncini @lbz____ (@AIatMeta): youtu.be/Pas2NE76220 • SkyPilot: Your AI Infra, Frontier Capabilities - Zhanghao Wu @Michaelvll1 + Kevin Mingtarja @kevin_mingtarja (SkyPilot): youtu.be/uwLkrJb5pEA • Training at Scale with Confidence (SkyPilot x CoreWeave SUNK) - Deok Filho @deok_filho (@CoreWeave): youtu.be/vdP0fhqpYZs
YouTube video
YouTube
YouTube video
YouTube
YouTube video
YouTube
SkyPilot tweet media
English
0
0
4
244
SkyPilot
SkyPilot@skypilot_org·
SkyPilot now works natively with @VAST_Data storage 🤝 Training runs often start with a dead period - copying data while GPUs sit idle. With VAST + SkyPilot, you can skip that entirely. • Mount petabyte-scale data directly. No staging, no prefetch pipelines, no idling • Stream at NVMe speeds to SkyPilot-managed nodes across any Kubernetes, Slurm, or neocloud • Switch compute providers without touching your storage config Read the blog by the @Vast_Data team: vastdata.com/blog/instant-d…
SkyPilot tweet media
English
0
2
16
584
SkyPilot
SkyPilot@skypilot_org·
Announcing SkyPilot v0.12.0! 🎉 • Agent Skill: coding agents can launch and manage cloud GPUs on their own • Slurm Support: unified interface across Slurm, K8s, and cloud • Job Groups: run heterogeneous parallel workloads as one unit • Recipes: templatize your AI workloads and share them across your team 🔗 Full release notes: github.com/skypilot-org/s…
SkyPilot tweet media
English
0
0
13
523
SkyPilot
SkyPilot@skypilot_org·
AI Infra Meetup with SkyPilot and CoreWeave - what a night! Packed room, great conversations, and solid talks on scaling AI infra across K8s, Slurm, and neocloud. 🌟 Highlights: @lbz____ (@AIatMeta) shared how Meta unified 100k+ GPUs across Slurm clusters with SkyPilot. @Michaelvll1 + @kevin_mingtarja (@skypilot_org) walked through SkyPilot's new features and ran a live multi-cloud demo. @deok_filho (@CoreWeave) broke down SUNK and the SkyPilot x CoreWeave integration. Huge shoutout to all speakers! Thanks to @CoreWeave @wandb for being amazing partners in making this happen. Excited to keep building this AI infra community. More events coming soon.
SkyPilot tweet mediaSkyPilot tweet mediaSkyPilot tweet mediaSkyPilot tweet media
English
0
3
19
4.4K
SkyPilot
SkyPilot@skypilot_org·
Today in SF! The AI Infra Meetup with SkyPilot and CoreWeave is happening tonight. Join engineers from @Meta AI, SkyPilot, and @CoreWeave for tech talks on scaling training and batch jobs across K8s, Slurm, and cloud, plus plenty of time to mix and connect. The event is at capacity, but the waitlist is open! Join now: luma.com/h52qyhmt?utm_s…
English
1
4
11
2K
SkyPilot retweetledi
lucca bertoncini
lucca bertoncini@lbz____·
Giving a talk this Wednesday (3/25) in SF on SkyPilot at 100k+ GPU scale. Come through if you're around! luma.com/h52qyhmt
English
0
1
5
674
SkyPilot
SkyPilot@skypilot_org·
SkyPilot is now @hcompany_ai's standard AI infrastructure layer. Online RL, previously impossible on SLURM, now runs seamlessly on K8s. Holo 2 was trained on SkyPilot. One unified interface for H100 clusters, zero infra friction for researchers. Thrilled to support H Company's journey to build the future of autonomous AI agents. 🤖💪
H@hcompany_ai

🚀 The H Company Tech Stack: Part 1 We are excited to launch a new series of technical deep dives into the AI Tech Stack powering H Company. Over the coming weeks, we’ll be sharing how we build, scale, and optimize the infrastructure behind our Holo frontier models. First up: Unlocking Online RL and AI Workflows on K8s using SkyPilot. (1/5🧵)

English
0
0
9
635
SkyPilot
SkyPilot@skypilot_org·
SkyPilot + @CoreWeave AI Infra Meetup is next week! @Meta AI, SkyPilot, and CoreWeave will give tech talks on scaling training and batch jobs across K8s, Slurm, and cloud — plus plenty of time to mingle and socialize. Spots are limited. Register now: luma.com/h52qyhmt?utm_s…
English
0
0
9
421
SkyPilot
SkyPilot@skypilot_org·
Karpathy's Autoresearch is bottlenecked by a single GPU. We removed the bottleneck. We gave the agent access to our K8s cluster with H100s and H200s and let it provision its own GPUs. Over 8 hours: • ~910 experiments instead of ~96 sequentially • Discovered that scaling model width mattered more than all hparam tuning • Taught itself to exploit heterogenous hardware: use H200s for validation, screen ideas on H100s Full setup and results: blog.skypilot.co/scaling-autore… @karpathy
SkyPilot tweet media
English
5
53
371
136.1K
SkyPilot retweetledi
Nebius
Nebius@nebiusai·
Tuesday 3/17 at #NVIDIAGTC, Booth 713. See how global platforms are scaling AI in production on Nebius: 1:30 pm —@skypilot_org 2 pm — Photoroom 4 pm — @Revolut 5 pm — @DataRobot Training. Inference. Enterprise scale. #GTC26
Nebius tweet media
English
0
4
58
3.4K
SkyPilot
SkyPilot@skypilot_org·
👋 SkyPilot AI Infra Meetup is back! 📆 Wed, Mar 25th, 5:00 PM 📍 San Francisco Join us in person for the AI Infra Meetup with SkyPilot! We'll be talking about open-source AI infra for batch and training workloads, K8s, Slurm, and more! Come connect with fellow builders, share insights, and learn from experts on the latest in AI infra! 🚀 🔗 RSVP now: luma.com/h52qyhmt?utm_s…
SkyPilot tweet media
English
0
1
8
464
SkyPilot
SkyPilot@skypilot_org·
SkyPilot Recipes lets you templatize your AI workloads and share them across your entire team. Save a YAML config once, and anyone can launch clusters with the same predefined setup, directly from the CLI. • Standardize dev environments and training infra • Launch instantly with sky launch recipes: • Edit and manage recipes from the SkyPilot dashboard 🔗blog.skypilot.co/skypilot-recip…
English
0
0
8
772
SkyPilot retweetledi
David Bar
David Bar@observie·
half a day with mjlab and Q1 Hoper is walking oh my, this is pure joy thank you @kevin_zakka and thank you SkyPilot for making gpuing a breeze @bromil101 @zongheng_yang
English
7
10
87
5.2K
SkyPilot
SkyPilot@skypilot_org·
SkyPilot powers physical AI! Excited to see mjlab ship cloud training on SkyPilot! mjlab brings GPU-accelerated sim to robot learning with minimal setup, and now SkyPilot handles the infra so researchers can focus on the science. Thanks to @kevin_zakka for sharing!
Kevin Zakka@kevin_zakka

mjlab now supports cloud training via SkyPilot. One command launches a GPU instance, syncs your code, trains, and tears down when done. We support 2 modes: direct uv install and Docker. Multi-GPU and hyperparameter sweeps work out of the box. mujocolab.github.io/mjlab/main/sou…

English
0
3
12
2.5K
SkyPilot
SkyPilot@skypilot_org·
𝗦𝗸𝘆𝗣𝗶𝗹𝗼𝘁 𝘃𝟬.𝟭𝟭.𝟮 𝗶𝘀 𝗼𝘂𝘁! 🚀 This release brings Slurm support, Job Groups for RL on heterogeneous hardware, enhanced Pools with autoscaling, external links in the dashboard, 7x faster object store access, and much more! Better scheduling, security, and multi-backend control for infra teams. Faster iteration and less infra wrangling for ML engineers. 𝘂𝘃 𝗽𝗶𝗽 𝗶𝗻𝘀𝘁𝗮𝗹𝗹 "𝘀𝗸𝘆𝗽𝗶𝗹𝗼𝘁>=𝟬.𝟭𝟭.𝟮" Learn more: github.com/skypilot-org/s…
SkyPilot tweet media
English
0
0
15
574
SkyPilot
SkyPilot@skypilot_org·
RL post-training needs heterogeneous hardware - beefy GPUs for the trainer, cheap GPUs for rollouts, and high-memory CPU instances for replay buffers. Running it all on top-tier GPUs is wasteful. SkyPilot Job Groups simplifies workloads with heterogeneous requirements: • One YAML to run RL workloads on heterogeneous hardware. • Automatic service discovery • Coordinated creation and shutdown Define each component with the right resources and launch as one unit. Read the blog now: 🔗blog.skypilot.co/job-groups/
SkyPilot tweet media
English
2
5
31
2.3K