emi

1.8K posts

emi

@gpuemi

co-founder @wafer_ai (yc s25) -- ai that makes ai chips go faster

san francisco, ca Katılım Aralık 2015

2K Takip Edilen1K Takipçiler

Sabitlenmiş Tweet

emi@gpuemi·21 Ara

(1/8) we’re launching the wafer vscode / cursor extension to help you develop, profile, and optimize gpu kernels as efficiently as possible would love feedback from ppl writing cuda / cutlass/cute / training + inference perf folks links to install below or at wafer dot ai

English

20.4K

emi retweetledi

steve@gpusteve·12h

deepseek v4 added to waferpass. 1k req every 5 hours! link below

English

928

emi@gpuemi·3d

@gpusteve you steal the best flavor and don’t even appreciate it

English

steve@gpusteve·3d

so desperate for caffeine today drank a celsius retro vibe

English

198

emi retweetledi

Arfur Rock@ArfurRock·5d

Indeed the year of agents in bio! Latch is at $15M RR, up 5x QoQ. Targeting $130M 2026. Closing a Series B now at $500M.

Kenny Workman@kenbwork

This is the year of agents in biology. What you're seeing in code is already unfolding in molecular data analysis, reorganizing workflows in basic research and drug development. Path forward is focused benchmarking + engineering scoped to specific types of assays. Just as coding agents had to reliably write JavaScript before they could build a browser, biology agents must first learn to accurately process and interpret concrete measurements, (eg. spatial assays), before they can reason about disease, drug mechanism, or patient response. Our roadmap reflects this progression: procedural skill in analysis -> emergent biological reasoning -> synthesis across data types, translational context, and realistic ambiguity. Towards systems that can eventually support expensive, high-stakes decisions in drug programs or research projects. Diffusion in biology is slower than software and needs to be thought through carefully. We work directly with the teams building measurement tech (eg. TakaraBio and Vizgen) and package assay-specific agents alongside their kits and instruments. Scientists complete sample preparation, then use these tech-specific agents to move from raw data to answers and figures. Our partners white-label our platform; we do not run a direct biotech sales motion. Now hiring rapidly across major assay categories, prioritized by which we believe will contribute most to the area under the molecular data curve over the next several years - Spatial - Single Cell - Epigenomics - Genomics - Perturbation/Screening - Diagnostics Looking for talented scientists and engineers with strong foundations in theory and deep experience in these areas to help us build scientifically accurate agents.

English

227

87.9K

emi@gpuemi·6d

@arankomatsuzaki yes, I rlly respect Diana but was very confused by this RFS

English

105

emi retweetledi

Aran Komatsuzaki@arankomatsuzaki·6d

This feels like confusing a serving-runtime problem for a chip-startup opportunity. Agents do change inference patterns: loops, tool calls, branching, long context, KV reuse, burstiness. But most of that is an inference systems problem: scheduling, routing, KV-cache management, etc. Think Dynamo. By the time a new chip co tapes out + builds a compiler stack + wins cloud distribution, NVIDIA/AMD will likely have baked the obvious hardware-level optimizations into existing platforms.

Y Combinator@ycombinator

Inference Chips for Agent Workflows @sdianahu Most AI chips are designed for "prompt in, response out." Agents don't work that way. They loop, branch, and hold context across dozens of steps, and current GPUs hit 30–40% utilization as a result. That gap is where purpose-built silicon wins.

English

25.4K

emi@gpuemi·28 Nis

fully cursor pilled again with their v3 agents ui

English

emi@gpuemi·27 Nis

is someone gonna say it

Sakura Yuki@sakurayukiai

Wait, Wall Street panic sold SK Hynix stock because Google's TurboQuant paper compresses the KV cache by 6x?? Do they not realize how this works? If you give me 6x memory efficiency, I don't buy less VRAM. I just multiply my context window by 6 until I OOM again 😭

English

209

emi retweetledi

fin@fi56622380·23 Nis

@benitoz @polynoamial @sama @OpenAI Cuda moat eroded somewhat，if you ask amd engineer，they would confidently saying that Any sw moat is eroded with coding agent, Cuda is no exception

English

152

57K

emi retweetledi

steve@gpusteve·23 Nis

we've quantized kimi-k2.6 to mxfp4 on amd! download and use today! @AIatAMD

English

7.7K

emi retweetledi

Mario Valle Reyes 🚩🚩🚩@bilbeny·20 Nis

El primer @ClawCon oficial de México 🇲🇽 es este Sábado en Guadalajara. El fenómeno mundial @openclaw 🦞 sigue creciendo. Demos, charlas, talleres, networking y *además*, comida mexicana 🌮 Regístrate: 🎟️ luma.com/clawconguad Gracias a @wafer_ai y a @improving por el apoyo.

Español

138

40.9K

emi@gpuemi·18 Nis

@imSaiya0x @lihanc02 @gpusteve we cap by # of requests per 5 hour window. the starter plan has 1k per window.

English

imsaiya@imSaiya0x·18 Nis

@gpuemi @lihanc02 @gpusteve Whats the concurrency limit for the starter coding plan

English

emi retweetledi

steve@gpusteve·16 Nis

100m output tokens crossed on wafer pass in 1 day. :OOOO

steve@gpusteve

building with ai agents is getting expensive fast. per-token pricing makes it hard to predict cost, slows experimentation, and turns every iteration into a tradeoff. we've used agents to optimize inference pipelines to provide you with the fastest and most affordable inference out there! see below our qwen 3.5 inference against base sglang.

English

2.6K

emi retweetledi

steve@gpusteve·15 Nis

English

3.1K

emi@gpuemi·18 Nis

@imSaiya0x @lihanc02 @gpusteve yes

imsaiya@imSaiya0x·18 Nis

@gpuemi @lihanc02 @gpusteve Im currently subbed is this sustainable tho?

English

emi@gpuemi·17 Nis

@lihanc02 @gpusteve 0

Hanchen Li@lihanc02·16 Nis

@gpusteve I am actually curious much was the gpu cost for you guys roughly

English

100

emi retweetledi

Reiner Pope@reinerpope·16 Nis

I chatted with @ysmulki about MatX, chip design and where silicon designed for LLMs is headed (8:17) Tightly coupling SRAM and HBM on one chip (14:03) More MoE FLOPS, smaller KV cache load (16:08) Numerics: from 32-bit to 4-bit (19:02) Targeting both training and inference (22:14) Chip timelines (27:15) Logic and memory scarcity (29:42) Compute costs (32:07) Latency: from 20ms to 1ms as the new table stakes (40:50) Programming the chip (43:00) Starting MatX (47:11) Codesign without seeing the models (51:57) Interconnect design (55:44) Performance modeling philosophy (1:07:02) Prefill vs. decode (1:13:47) What's next

English

314

65.5K

emi retweetledi

Seth Bannon@sethbannon·15 Nis

@gpusteve @wafer_ai @fiftyyears @Liquid2V @ycombinator Intelligence per watt may be the biggest constraint on civilizational progress.

English

245

emi retweetledi

steve@gpusteve·15 Nis

excited to share @wafer_ai's seed round led by @fiftyyears with participation from @Liquid2V, @ycombinator, and many amazing angels! we started wafer with a simple idea: maximize intelligence per watt. we’ve since been building agents to optimize kernels, inference engines, and the full stack of ai systems -pushing hardware closer to its limits. today we're launching wafer pass - a high-limit, fast api for running agents on the fastest open models, without managing your own infra. wafer.ai/pass

English

7.7K

emi@gpuemi·14 Nis

just saw uber driver get a 2 minute voice note from what seemed to be the gf/wife and immediately respond “ok” without listening to it. hell yeah brother

English

128

emi@gpuemi·7 Nis

use wafer unlimited and pay $10/week to get unlimited tokens on frontier open-source llms for openclaw. starting with qwen3.5 397b turbo (≈2.5× faster vs. generic providers), with more turbo models coming included at same price. apply for early access: wafer.ai/unlimited

Marc Andreessen 🇺🇸@pmarca

Magical OpenClaw experiences that use frontier models cost $300-1,000/day today, heading to $10,000/day and more. The future shape of the entire technology industry will be how to drive that to $20/month.

English

207

Keşfet

@gpusteve @arankomatsuzaki @benitoz @polynoamial @sama @OpenAI @AIatAMD @clawcon