0xYottascale

287 posts

0xYottascale

0xYottascale

@0xyottascale

CEO, co-founder @YottaLabs

United States Katılım Ocak 2011
465 Takip Edilen121 Takipçiler
0xYottascale retweetledi
Torsten Hoefler 🇨🇭
Jack Dongarra's opening keynote at HACI 2026! #HPC in transition - the investments in compute infrastructure are unprecedented but focus mostly on #AI workloads. Scientific numerical simulations need to adjust! Many interesting challenges ahead.
Torsten Hoefler 🇨🇭 tweet mediaTorsten Hoefler 🇨🇭 tweet mediaTorsten Hoefler 🇨🇭 tweet media
English
0
10
24
2.3K
0xYottascale retweetledi
Meryem Arik
Meryem Arik@MeryemArik9·
“Quantize first, ask later” is a weird motto for an inference provider given how concerned customers are with whether there is secret quantisation going on lol
Meryem Arik tweet media
English
2
3
116
12.3K
0xYottascale
0xYottascale@0xyottascale·
Yet another LMCache?
vLLM@vllm_project

KV cache shouldn't disappear every time vLLM restarts. With @novita_labs, we're sharing PegaFlow — a production-grade external KV cache service that plugs into vLLM through the external KV connector interface. PegaFlow runs as a standalone Rust daemon owning the host KV pool, SSD cache, and RDMA resources. vLLM workers attach via CUDA IPC + gRPC, and cache survives engine crashes, upgrades, and model switches. In production-oriented evaluations: 🚀 2.15× faster vLLM startup with a pre-warmed 500 GiB host pool 📈 56% higher throughput for 8 Qwen3-8B instances sharing one cache ⚡ 72% higher throughput for DeepSeek-V3.2 MLA TP8 (logical KV stored once, not per rank) 🌐 194 GB/s average remote-read throughput across nodes Three-level hierarchy: pinned DRAM, remote DRAM over RDMA, local SSD on io_uring. Integrates through the existing `kv_transfer_config` path — no vLLM source changes. 📖 vllm.ai/blog/2026-05-1…

English
0
0
0
22
0xYottascale
0xYottascale@0xyottascale·
@samhogan Yotta Labs also offers both on-demand and long-term reserved contracts for H100/200 and B200/B300.
English
2
0
0
63
Sam Hogan 🇺🇸
Sam Hogan 🇺🇸@samhogan·
RunPod is completely out of H100s Make sure you’re prepared for this summer going to be a blood bath
English
35
19
503
115.6K
0xYottascale
0xYottascale@0xyottascale·
Welcome to Bellevue!
Zhihao Jia@JiaZhihao

The MLSys’26 program is live! Check out the accepted papers: mlsys.org/virtual/2026/p… This year marks several exciting firsts: • 28 industry track papers bridging MLSys research & real-world deployment • Our inaugural competition track featuring AWS Trainium, Google Graph Scheduling, and NVIDIA FlashInfer AI Kernel contests Early registration deadline: April 1 — don’t miss it! See you in Seattle this May🌲

English
0
0
0
27
0xYottascale retweetledi
Kraken
Kraken@krakenfx·
Kraken is deprecating its existing cross-chain provider and migrating to @Chainlink CCIP as its exclusive cross-chain infra to secure Kraken Wrapped Bitcoin (kBTC) & all future Kraken Wrapped Assets. Kraken chose Chainlink CCIP because it offers enterprise-grade infrastructure with strict security & risk management requirements, including: • ISO 27001 and SOC 2 Type 2 certifications • Secure by default architecture • 16 independent nodes • Native rate limits, and more. Together, Chainlink and Kraken can help accelerate the global adoption of crypto by unlocking utility and distribution for all Kraken Wrapped Assets across DeFi. For kBTC customers, no action is required. More details on the migration process to follow on official Kraken channels.
English
125
282
1.8K
366.3K
0xYottascale
0xYottascale@0xyottascale·
Congrats — huge milestone. Strong thesis. We're attacking it from a different angle at Yotta Labs: inference optimization as a multi-silicon systems problem. Hardware is one variable. Orchestration is the bigger one. Excited to see more teams pushing the frontier here.
RadixArk@radixark

Today, we are thrilled to officially launch RadixArk with $100M in Seed funding at a $400M valuation. The round was led by @Accel and co-led by @sparkcapital. RadixArk exists to make frontier AI infrastructure open and accessible to everyone. Today, the systems behind the most capable AI models are concentrated in a small number of companies. As a result, most AI teams are forced to rebuild training and inference stacks from scratch, duplicating the same infrastructure work instead of focusing on new models, products, and ideas. RadixArk was founded to change that. We are building an AI platform that makes it easier for teams to train and serve the best models at scale. RadixArk comes from the open-source community. We started with SGLang, where many of us are core developers and maintainers, and expanded our work to Miles for large-scale RL and post-training. We will continue contributing to both projects and working with the community to make them the strongest open-source infrastructure foundations for frontier AI. We would like to thank our long-term partners, contributors, and the broader SGLang community for believing in this mission. We're also grateful to @Accel and @sparkcapital, NVentures (Venture capital arm of @nvidia), Salience Capital, A&E Investment, @HOFCapital, @walden_catalyst, @AMD, LDVP, WTT Fubon Family, @MediaTek, Vocal Ventures, @Sky9Capital and our angel investors @ibab, @LipBuTan1, Hock Tan, @johnschulman2, @soumithchintala, @lilianweng, @oliveur, @Thom_Wolf, @LiamFedus, @robertnishihara, @ericzelikman, @OfficialLoganK, and @multiply_matrix among others. Thanks for the exclusive interview with @MeghanBobrowsky at @WSJ about our vision.

English
0
1
8
1.4K
0xYottascale
0xYottascale@0xyottascale·
Excited that other folks are also looking at this interesting problem with a lot of design spaces to explore! We’ve actually been thinking about this since last summer, when Jensen started framing “AI Factories” and “tokens as the product of intelligence.” Our view is that tokens are a much better unit than GPUs for commoditizing AI compute, and we’ve developed a full market design covering spot, forward, and futures markets. DMed you with more details.
English
0
2
3
465
Guy Wuollet
Guy Wuollet@guywuolletjr·
Is anyone building physically settled derivatives for 1M tokens from a specific AI model (eg. a one month dated future for 1M Opus 4.7 tokens) Physical settlement here might be the killer app for decentralized compute networks?
English
27
1
121
21.7K
0xYottascale retweetledi
Gaurav
Gaurav@gauravisnotme·
If you are really doing a chip startup, I would strongly recommend against taking advice from people who know nothing about it. Especially the lot below. Let me know and I will connect you with 10x better folks. Also, as a bonus - they won’t force GStack upon you.
Y Combinator@ycombinator

Inference Chips for Agent Workflows @sdianahu Most AI chips are designed for "prompt in, response out." Agents don't work that way. They loop, branch, and hold context across dozens of steps, and current GPUs hit 30–40% utilization as a result. That gap is where purpose-built silicon wins.

English
27
10
575
83.9K
0xYottascale retweetledi
Johnny
Johnny@johnnygoodai·
My favorite part of a AI-native cloud is that you can easily launch a LLM service on different accelerators, @NVIDIAAIDev @AMD @awscloud, no vendor lock-in
Johnny tweet media
English
0
2
4
138