Vasanth Mohan

288 posts

Vasanth Mohan banner
Vasanth Mohan

Vasanth Mohan

@v_mohan_

Head of Dev Rel @SambaNovaAI

Katılım Aralık 2016
202 Takip Edilen80 Takipçiler
Jon Saad-Falcon
Jon Saad-Falcon@JonSaadFalcon·
Personal AI should run on your personal devices. So, we built OpenJarvis: a personal AI that lives, learns, and works on-device. Try it today and top the OpenJarvis Leaderboard for a chance to win a Mac Mini! Collab w/ @Avanika15, John Hennessy, @HazyResearch, and @Azaliamirh. Details in thread.
Jon Saad-Falcon tweet media
English
36
92
314
96.2K
Andrej Karpathy
Andrej Karpathy@karpathy·
With the coming tsunami of demand for tokens, there are significant opportunities to orchestrate the underlying memory+compute *just right* for LLMs. The fundamental and non-obvious constraint is that due to the chip fabrication process, you get two completely distinct pools of memory (of different physical implementations too): 1) on-chip SRAM that is immediately next to the compute units that is incredibly fast but of very of low capacity, and 2) off-chip DRAM which has extremely high capacity, but the contents of which you can only suck through a long straw. On top of this, there are many details of the architecture (e.g. systolic arrays), numerics, etc. The design of the optimal physical substrate and then the orchestration of memory+compute across the top volume workflows of LLMs (inference prefill/decode, training/finetuning, etc.) with the best throughput/latency/$ is probably today's most interesting intellectual puzzle with the highest rewards (\cite 4.6T of NVDA). All of it to get many tokens, fast and cheap. Arguably, the workflow that may matter the most (inference decode *and* over long token contexts in tight agentic loops) is the one hardest to achieve simultaneously by the ~both camps of what exists today (HBM-first NVIDIA adjacent and SRAM-first Cerebras adjacent). Anyway the MatX team is A++ grade so it's my pleasure to have a small involvement and congratulations on the raise!
Reiner Pope@reinerpope

We’re building an LLM chip that delivers much higher throughput than any other chip while also achieving the lowest latency. We call it the MatX One. The MatX One chip is based on a splittable systolic array, which has the energy and area efficiency that large systolic arrays are famous for, while also getting high utilization on smaller matrices with flexible shapes. The chip combines the low latency of SRAM-first designs with the long-context support of HBM. These elements, plus a fresh take on numerics, deliver higher throughput on LLMs than any announced system, while simultaneously matching the latency of SRAM-first designs. Higher throughput and lower latency give you smarter and faster models for your subscription dollar. We’ve raised a $500M Series B to wrap up development and quickly scale manufacturing, with tapeout in under a year. The round was led by Jane Street, one of the most tech-savvy Wall Street firms, and Situational Awareness LP, whose founder @leopoldasch wrote the definitive memo on AGI. Participants include @sparkcapital, @danielgross and @natfriedman’s fund, @patrickc and @collision, @TriatomicCap, @HarpoonVentures, @karpathy, @dwarkesh_sp, and others. We’re also welcoming investors across the supply chain, including Marvell and Alchip. @MikeGunter_ and I started MatX because we felt that the best chip for LLMs should be designed from first principles with a deep understanding of what LLMs need and how they will evolve. We are willing to give up on small-model performance, low-volume workloads, and even ease of programming to deliver on such a chip. We’re now a 100-person team with people who think about everything from learning rate schedules, to Swing Modulo Scheduling, to guard/round/sticky bits, to blind-mated connections—all in the same building. If you’d like to help us architect, design, and deploy many generations of chips in large volume, consider joining us.

English
323
507
7.4K
2.5M
yontr
yontr@yontrtwt·
Before chatgpt came out when most AI was just CNNs I had really high hopes for sambanova because they were the only sram based companies which had a lot of DRAM in addition to SRAM. They were also the first company I heard use the term “dataflow compute”. However now we live in a different world and unfortunately sambanova did not pivot well. Now they are not the fast option or the cheap one.
English
1
0
3
795
Vasanth Mohan retweetledi
SambaNova
SambaNova@SambaNovaAI·
SN50 is here, the fastest chip built for agentic AI. Max speed of up to 5X faster; run agentic AI at a 3X lower cost than GPUs, unlocking cloud-scale inference economics. We’ve also planned a multi-year strategic collaboration with @intel &raised $350M+ from @Vista_Equity, Cambium Capital & @TRowePrice to scale manufacturing &cloud capacity. Learn more: bit.ly/4qUsx9F
English
14
55
240
70.2K
Vasanth Mohan
Vasanth Mohan@v_mohan_·
@EchoDrifter1145 Very important to scale to large models and deliver fast performance, cost efficiently!
English
1
0
1
34
犬養
犬養@EchoDrifter1145·
3層構造のメモリアーキテクチャだそう。あんまり聴いたことないね。 "SambaNovaはSN50がSRAM、HBM、DDRの3層メモリを使用しており、「画期的なモデル容量」を提供し、10兆以上のパラメータと1,000万以上のコンテキスト長を持つモデルを実行可能にしていると述べました。" crn.com/news/component…
日本語
1
1
3
661
犬養
犬養@EchoDrifter1145·
SambaNova SN50はソフトバンクのデータセンターに世界で一番最初に入るみたいね。 sambanova.ai/press/sambanov…
日本語
1
0
0
117
Vasanth Mohan retweetledi
Hugging Face
Hugging Face@huggingface·
This might be the biggest AI hackathon ever: * >6,300 registrants * Runs for 2 weeks (Nov. 14-30) * Open to anyone, anywhere virtually * $20,000 in cash prizes + $3.5M+ in sponsor credits Hosted by @Anthropic and @Gradio, along with 10 sponsors, join kickoff in 30 minutes 👇
Hugging Face tweet media
Gradio@Gradio

Join us LIVE at MCP's first Birthday kickoff at 10 am PT today!🎂 Don't miss out on details about the celebration from the co-hosts, @Gradio and @AnthropicAI. 🔥 We've also got an exciting lineup of speakers from @Huggingface, @OpenAI, @GoogleDeepMind, @modal, @blaxelAI, @SambaNovaAI, and @nebiustf ready to share their insights.

English
19
49
488
80.2K
Vasanth Mohan retweetledi
Jon Saad-Falcon
Jon Saad-Falcon@JonSaadFalcon·
Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW): intelligence delivered (capabilities) per unit of power consumed (efficiency). Today’s Local LMs already handle 88.7% of single-turn chat and reasoning queries, with local IPW improving 5.3× in 2 years—driven by better models (3.2×) and better accelerators (1.7×). As local IPW improves, a meaningful fraction of workloads can shift from centralized infrastructure to local compute, with IPW serving as the critical metric for tracking this transition. (1/N)
Jon Saad-Falcon tweet media
English
55
138
455
225.6K
Vasanth Mohan retweetledi
SambaNova
SambaNova@SambaNovaAI·
Our RDU delivers 4X more intelligence per joule than Nvidia’s latest B200 Blackwell chip. Research from Stanford introduces "Intelligence per Joule", a new metric that best explains AI efficiency from chips to models. Learn more about this benchmark: sambanova.ai/blog/best-inte…
English
1
8
14
2.7K
Risphere
Risphere@risphereeditor·
SambaNova just added OpenAI's big open-source model OSS 120B. It runs at over 700 tokens per second, so around 400 words per second.
SambaNova@SambaNovaAI

🚨 New drop: @OpenAI-OSS 120B on SambaCloud at over 700 t/s ✅ US-built, Apache 2.0. Own, control, trust it ✅ Performance at low cost $0.22/$0.59 ✅ Deploy wherever you need & fine tune the model with your data ✅ Inference at over 700 t/s Build here: bit.ly/4nqGlYh

English
1
0
3
137
SambaNova
SambaNova@SambaNovaAI·
🐳 @deepseek_ai -V3.1 is on SambaCloud, & it's blazing fast at 169 tokens per second. Verified by @ArtificialAnlys. DeepSeek-V3.1 also shows improvements across various benchmarks compared to prior iterations of both the R1 & V3 models for thinking & non-thinking modes.
English
3
2
31
9.5K
Vasanth Mohan retweetledi
Risphere
Risphere@risphereeditor·
SambaNova is now hosting the best AI chat model based on Artificial Analysis (DeepSeek-V3.1) at 200 tokens (~150 words) per second. You can also test the provider on OpenRouter.
SambaNova@SambaNovaAI

🐋 New drop: @DeepSeek_ai V3.1 on SambaCloud @ 200+ t/s Beats Claude Opus 4 & earlier versions in coding Hybrid Thinking Mode = reasoning when needed, raw speed when not Open-source + deploy privately at lower cost

English
0
1
4
218
Elliot Arledge
Elliot Arledge@elliotarledge·
@GroqInc so bad that you should replace it with deepseek v3.1
English
2
0
77
2.6K
Groq Inc
Groq Inc@GroqInc·
Three weeks in, what’s your impression of gpt-oss so far?
English
85
5
229
47.4K
Vasanth Mohan retweetledi
SambaNova
SambaNova@SambaNovaAI·
🐋 New drop: @DeepSeek_ai V3.1 on SambaCloud @ 200+ t/s Beats Claude Opus 4 & earlier versions in coding Hybrid Thinking Mode = reasoning when needed, raw speed when not Open-source + deploy privately at lower cost
English
2
12
44
131.5K
Vasanth Mohan retweetledi
SambaNova
SambaNova@SambaNovaAI·
Watch @hume_ai + GPT-5 on SambaNova Cloud bring characters (and fun chaos) to life in real time. 🎙️Realistic voices + powerful reasoning = stories you didn’t see coming.
English
0
5
14
22.2K