0xYottascale

287 posts

0xYottascale

@0xyottascale

CEO, co-founder @YottaLabs

United States Katılım Ocak 2011

465 Takip Edilen121 Takipçiler

0xYottascale@0xyottascale·3d

GIF

Homeland Security@DHSgov

An alien who is in the U.S. temporarily and wants a Green Card must return to their home country to apply. This policy allows our immigration system to function as the law intended instead of incentivizing loopholes. The era of abusing our nation’s immigration system is over.

ZXX

0xYottascale retweetledi

Torsten Hoefler 🇨🇭@thoefler·3d

Jack Dongarra's opening keynote at HACI 2026! #HPC in transition - the investments in compute infrastructure are unprecedented but focus mostly on #AI workloads. Scientific numerical simulations need to adjust! Many interesting challenges ahead.

English

2.3K

0xYottascale retweetledi

Meryem Arik@MeryemArik9·6d

“Quantize first, ask later” is a weird motto for an inference provider given how concerned customers are with whether there is secret quantisation going on lol

English

116

12.3K

0xYottascale@0xyottascale·6d

Yet another LMCache?

vLLM@vllm_project

KV cache shouldn't disappear every time vLLM restarts. With @novita_labs, we're sharing PegaFlow — a production-grade external KV cache service that plugs into vLLM through the external KV connector interface. PegaFlow runs as a standalone Rust daemon owning the host KV pool, SSD cache, and RDMA resources. vLLM workers attach via CUDA IPC + gRPC, and cache survives engine crashes, upgrades, and model switches. In production-oriented evaluations: 🚀 2.15× faster vLLM startup with a pre-warmed 500 GiB host pool 📈 56% higher throughput for 8 Qwen3-8B instances sharing one cache ⚡ 72% higher throughput for DeepSeek-V3.2 MLA TP8 (logical KV stored once, not per rank) 🌐 194 GB/s average remote-read throughput across nodes Three-level hierarchy: pinned DRAM, remote DRAM over RDMA, local SSD on io_uring. Integrates through the existing `kv_transfer_config` path — no vLLM source changes. 📖 vllm.ai/blog/2026-05-1…

English

0xYottascale@0xyottascale·18 May

The price for H200 must be wrong. 🤷 check the GPU prices at mercatus-ai.com

Small Cap Snipa@SmallCapSnipa

Nvidia GPU prices just went NUCLEAR overnight H200 is now $6.40/hour (+29%), more expensive than the B200 at 5.68/hour (6.4%) Good lord what an opportunity for cloud service providers: $NBIS $IREN $CRWV

English

0xYottascale@0xyottascale·18 May

Interested perspective. I think the key is where to strike the right balance. Abstract too much means performance loss.

Patrick C Toulme@PatrickToulme

x.com/i/article/2055…

English

0xYottascale@0xyottascale·16 May

@samhogan Here is the link to our console: console.yottalabs.ai

English

0xYottascale@0xyottascale·16 May

@samhogan Yotta Labs also offers both on-demand and long-term reserved contracts for H100/200 and B200/B300.

English

Sam Hogan 🇺🇸@samhogan·15 May

RunPod is completely out of H100s Make sure you’re prepared for this summer going to be a blood bath

English

503

115.6K

0xYottascale@0xyottascale·16 May

@samhogan console.yottalabs.ai

QME

0xYottascale@0xyottascale·15 May

Welcome to Bellevue!

Zhihao Jia@JiaZhihao

The MLSys’26 program is live! Check out the accepted papers: mlsys.org/virtual/2026/p… This year marks several exciting firsts: • 28 industry track papers bridging MLSys research & real-world deployment • Our inaugural competition track featuring AWS Trainium, Google Graph Scheduling, and NVIDIA FlashInfer AI Kernel contests Early registration deadline: April 1 — don’t miss it! See you in Seattle this May🌲

English

0xYottascale retweetledi

Kraken@krakenfx·14 May

Kraken is deprecating its existing cross-chain provider and migrating to @Chainlink CCIP as its exclusive cross-chain infra to secure Kraken Wrapped Bitcoin (kBTC) & all future Kraken Wrapped Assets. Kraken chose Chainlink CCIP because it offers enterprise-grade infrastructure with strict security & risk management requirements, including: • ISO 27001 and SOC 2 Type 2 certifications • Secure by default architecture • 16 independent nodes • Native rate limits, and more. Together, Chainlink and Kraken can help accelerate the global adoption of crypto by unlocking utility and distribution for all Kraken Wrapped Assets across DeFi. For kBTC customers, no action is required. More details on the migration process to follow on official Kraken channels.

English

125

282

1.8K

366.3K

0xYottascale@0xyottascale·13 May

😂

President Trump Commetary@RealPresidentT

Good morning, Beijing! It’s a beautiful airport! Much better than shithole Newark Airport which is a Biden era airport.

ART

0xYottascale@0xyottascale·13 May

💪

Johnny@johnnygoodai

Talking about multi-silicon, multi-cloud at @ComputerHistory

ART

0xYottascale@0xyottascale·7 May

lmao Orz

CuiMao@CuiMao

Welcome to China, Mr. Dalio! 😊 @DarioAmodei @AnthropicAI @claudeai @ClaudeDevs

0xYottascale@0xyottascale·6 May

Congrats — huge milestone. Strong thesis. We're attacking it from a different angle at Yotta Labs: inference optimization as a multi-silicon systems problem. Hardware is one variable. Orchestration is the bigger one. Excited to see more teams pushing the frontier here.

RadixArk@radixark

Today, we are thrilled to officially launch RadixArk with $100M in Seed funding at a $400M valuation. The round was led by @Accel and co-led by @sparkcapital. RadixArk exists to make frontier AI infrastructure open and accessible to everyone. Today, the systems behind the most capable AI models are concentrated in a small number of companies. As a result, most AI teams are forced to rebuild training and inference stacks from scratch, duplicating the same infrastructure work instead of focusing on new models, products, and ideas. RadixArk was founded to change that. We are building an AI platform that makes it easier for teams to train and serve the best models at scale. RadixArk comes from the open-source community. We started with SGLang, where many of us are core developers and maintainers, and expanded our work to Miles for large-scale RL and post-training. We will continue contributing to both projects and working with the community to make them the strongest open-source infrastructure foundations for frontier AI. We would like to thank our long-term partners, contributors, and the broader SGLang community for believing in this mission. We're also grateful to @Accel and @sparkcapital, NVentures (Venture capital arm of @nvidia), Salience Capital, A&E Investment, @HOFCapital, @walden_catalyst, @AMD, LDVP, WTT Fubon Family, @MediaTek, Vocal Ventures, @Sky9Capital and our angel investors @ibab, @LipBuTan1, Hock Tan, @johnschulman2, @soumithchintala, @lilianweng, @oliveur, @Thom_Wolf, @LiamFedus, @robertnishihara, @ericzelikman, @OfficialLoganK, and @multiply_matrix among others. Thanks for the exclusive interview with @MeghanBobrowsky at @WSJ about our vision.

English

1.4K

0xYottascale@0xyottascale·5 May

Excited that other folks are also looking at this interesting problem with a lot of design spaces to explore! We’ve actually been thinking about this since last summer, when Jensen started framing “AI Factories” and “tokens as the product of intelligence.” Our view is that tokens are a much better unit than GPUs for commoditizing AI compute, and we’ve developed a full market design covering spot, forward, and futures markets. DMed you with more details.

English

465

Guy Wuollet@guywuolletjr·4 May

Is anyone building physically settled derivatives for 1M tokens from a specific AI model (eg. a one month dated future for 1M Opus 4.7 tokens) Physical settlement here might be the killer app for decentralized compute networks?

English

121

21.7K

0xYottascale retweetledi

Gaurav@gauravisnotme·30 Nis

If you are really doing a chip startup, I would strongly recommend against taking advice from people who know nothing about it. Especially the lot below. Let me know and I will connect you with 10x better folks. Also, as a bonus - they won’t force GStack upon you.

Y Combinator@ycombinator

Inference Chips for Agent Workflows @sdianahu Most AI chips are designed for "prompt in, response out." Agents don't work that way. They loop, branch, and hold context across dozens of steps, and current GPUs hit 30–40% utilization as a result. That gap is where purpose-built silicon wins.

English

575

83.9K

0xYottascale@0xyottascale·29 Nis

Wow BTW, Yotta is 10^24 :)

Exa@ExaAILabs

We're excited to partner with Google to offer Grounding With Exa inside of Gemini models! Using Exa's agent-first search, Gemini models can now access billions of websites, technical docs, papers, people, companies, and more. 10^18🤝10^100

Magyar

0xYottascale retweetledi

Johnny@johnnygoodai·24 Nis

My favorite part of a AI-native cloud is that you can easily launch a LLM service on different accelerators, @NVIDIAAIDev @AMD @awscloud, no vendor lock-in

English

138

Keşfet

@samhogan @Chainlink @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA