emi

1.8K posts

emi banner
emi

emi

@gpuemi

co-founder @wafer_ai (yc s25) -- ai that makes ai chips go faster

san francisco, ca Katılım Aralık 2015
2K Takip Edilen1K Takipçiler
Sabitlenmiş Tweet
emi
emi@gpuemi·
(1/8) we’re launching the wafer vscode / cursor extension to help you develop, profile, and optimize gpu kernels as efficiently as possible would love feedback from ppl writing cuda / cutlass/cute / training + inference perf folks links to install below or at wafer dot ai
emi tweet media
English
6
6
86
20.4K
emi retweetledi
steve
steve@gpusteve·
deepseek v4 added to waferpass. 1k req every 5 hours! link below
steve tweet media
English
1
2
9
928
emi
emi@gpuemi·
@gpusteve you steal the best flavor and don’t even appreciate it
English
1
0
2
86
steve
steve@gpusteve·
so desperate for caffeine today drank a celsius retro vibe
English
1
0
5
198
emi retweetledi
Arfur Rock
Arfur Rock@ArfurRock·
Indeed the year of agents in bio! Latch is at $15M RR, up 5x QoQ. Targeting $130M 2026. Closing a Series B now at $500M.
Kenny Workman@kenbwork

This is the year of agents in biology. What you're seeing in code is already unfolding in molecular data analysis, reorganizing workflows in basic research and drug development. Path forward is focused benchmarking + engineering scoped to specific types of assays. Just as coding agents had to reliably write JavaScript before they could build a browser, biology agents must first learn to accurately process and interpret concrete measurements, (eg. spatial assays), before they can reason about disease, drug mechanism, or patient response. Our roadmap reflects this progression: procedural skill in analysis -> emergent biological reasoning -> synthesis across data types, translational context, and realistic ambiguity. Towards systems that can eventually support expensive, high-stakes decisions in drug programs or research projects. Diffusion in biology is slower than software and needs to be thought through carefully. We work directly with the teams building measurement tech (eg. TakaraBio and Vizgen) and package assay-specific agents alongside their kits and instruments. Scientists complete sample preparation, then use these tech-specific agents to move from raw data to answers and figures. Our partners white-label our platform; we do not run a direct biotech sales motion. Now hiring rapidly across major assay categories, prioritized by which we believe will contribute most to the area under the molecular data curve over the next several years - Spatial - Single Cell - Epigenomics - Genomics - Perturbation/Screening - Diagnostics Looking for talented scientists and engineers with strong foundations in theory and deep experience in these areas to help us build scientifically accurate agents.

English
8
5
227
87.9K
emi
emi@gpuemi·
@arankomatsuzaki yes, I rlly respect Diana but was very confused by this RFS
English
0
0
3
105
emi retweetledi
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
This feels like confusing a serving-runtime problem for a chip-startup opportunity. Agents do change inference patterns: loops, tool calls, branching, long context, KV reuse, burstiness. But most of that is an inference systems problem: scheduling, routing, KV-cache management, etc. Think Dynamo. By the time a new chip co tapes out + builds a compiler stack + wins cloud distribution, NVIDIA/AMD will likely have baked the obvious hardware-level optimizations into existing platforms.
Y Combinator@ycombinator

Inference Chips for Agent Workflows @sdianahu Most AI chips are designed for "prompt in, response out." Agents don't work that way. They loop, branch, and hold context across dozens of steps, and current GPUs hit 30–40% utilization as a result. That gap is where purpose-built silicon wins.

English
15
10
99
25.4K
emi
emi@gpuemi·
fully cursor pilled again with their v3 agents ui
English
0
0
2
74
emi retweetledi
fin
fin@fi56622380·
@benitoz @polynoamial @sama @OpenAI Cuda moat eroded somewhat,if you ask amd engineer,they would confidently saying that Any sw moat is eroded with coding agent, Cuda is no exception
English
5
10
152
57K
emi retweetledi
steve
steve@gpusteve·
we've quantized kimi-k2.6 to mxfp4 on amd! download and use today! @AIatAMD
steve tweet media
English
4
7
85
7.7K
emi retweetledi
steve
steve@gpusteve·
building with ai agents is getting expensive fast. per-token pricing makes it hard to predict cost, slows experimentation, and turns every iteration into a tradeoff. we've used agents to optimize inference pipelines to provide you with the fastest and most affordable inference out there! see below our qwen 3.5 inference against base sglang.
steve tweet media
English
2
2
8
3.1K
Hanchen Li
Hanchen Li@lihanc02·
@gpusteve I am actually curious much was the gpu cost for you guys roughly
English
1
0
1
100
emi retweetledi
Reiner Pope
Reiner Pope@reinerpope·
I chatted with @ysmulki about MatX, chip design and where silicon designed for LLMs is headed (8:17) Tightly coupling SRAM and HBM on one chip (14:03) More MoE FLOPS, smaller KV cache load (16:08) Numerics: from 32-bit to 4-bit (19:02) Targeting both training and inference (22:14) Chip timelines (27:15) Logic and memory scarcity (29:42) Compute costs (32:07) Latency: from 20ms to 1ms as the new table stakes (40:50) Programming the chip (43:00) Starting MatX (47:11) Codesign without seeing the models (51:57) Interconnect design (55:44) Performance modeling philosophy (1:07:02) Prefill vs. decode (1:13:47) What's next
English
14
44
314
65.5K
emi retweetledi
steve
steve@gpusteve·
excited to share @wafer_ai's seed round led by @fiftyyears with participation from @Liquid2V, @ycombinator, and many amazing angels! we started wafer with a simple idea: maximize intelligence per watt. we’ve since been building agents to optimize kernels, inference engines, and the full stack of ai systems -pushing hardware closer to its limits. today we're launching wafer pass - a high-limit, fast api for running agents on the fastest open models, without managing your own infra. wafer.ai/pass
steve tweet media
English
21
19
95
7.7K
emi
emi@gpuemi·
just saw uber driver get a 2 minute voice note from what seemed to be the gf/wife and immediately respond “ok” without listening to it. hell yeah brother
English
1
0
4
128
emi
emi@gpuemi·
use wafer unlimited and pay $10/week to get unlimited tokens on frontier open-source llms for openclaw. starting with qwen3.5 397b turbo (≈2.5× faster vs. generic providers), with more turbo models coming included at same price. apply for early access: wafer.ai/unlimited
Marc Andreessen 🇺🇸@pmarca

Magical OpenClaw experiences that use frontier models cost $300-1,000/day today, heading to $10,000/day and more. The future shape of the entire technology industry will be how to drive that to $20/month.

English
1
1
4
207