Vasanth Mohan

288 posts

Vasanth Mohan

@v_mohan_

Head of Dev Rel @SambaNovaAI

Katılım Aralık 2016

202 Takip Edilen80 Takipçiler

Vasanth Mohan@v_mohan_·12 Mar

@JonSaadFalcon @Avanika15 @HazyResearch @Azaliamirh Very cool to see more AI assistant frameworks

English

228

Jon Saad-Falcon@JonSaadFalcon·12 Mar

Personal AI should run on your personal devices. So, we built OpenJarvis: a personal AI that lives, learns, and works on-device. Try it today and top the OpenJarvis Leaderboard for a chance to win a Mac Mini! Collab w/ @Avanika15, John Hennessy, @HazyResearch, and @Azaliamirh. Details in thread.

English

314

96.2K

Vasanth Mohan@v_mohan_·25 Şub

@karpathy What do you think of @SambaNovaAI

English

Andrej Karpathy@karpathy·25 Şub

With the coming tsunami of demand for tokens, there are significant opportunities to orchestrate the underlying memory+compute *just right* for LLMs. The fundamental and non-obvious constraint is that due to the chip fabrication process, you get two completely distinct pools of memory (of different physical implementations too): 1) on-chip SRAM that is immediately next to the compute units that is incredibly fast but of very of low capacity, and 2) off-chip DRAM which has extremely high capacity, but the contents of which you can only suck through a long straw. On top of this, there are many details of the architecture (e.g. systolic arrays), numerics, etc. The design of the optimal physical substrate and then the orchestration of memory+compute across the top volume workflows of LLMs (inference prefill/decode, training/finetuning, etc.) with the best throughput/latency/$ is probably today's most interesting intellectual puzzle with the highest rewards (\cite 4.6T of NVDA). All of it to get many tokens, fast and cheap. Arguably, the workflow that may matter the most (inference decode *and* over long token contexts in tight agentic loops) is the one hardest to achieve simultaneously by the ~both camps of what exists today (HBM-first NVIDIA adjacent and SRAM-first Cerebras adjacent). Anyway the MatX team is A++ grade so it's my pleasure to have a small involvement and congratulations on the raise!

Reiner Pope@reinerpope

We’re building an LLM chip that delivers much higher throughput than any other chip while also achieving the lowest latency. We call it the MatX One. The MatX One chip is based on a splittable systolic array, which has the energy and area efficiency that large systolic arrays are famous for, while also getting high utilization on smaller matrices with flexible shapes. The chip combines the low latency of SRAM-first designs with the long-context support of HBM. These elements, plus a fresh take on numerics, deliver higher throughput on LLMs than any announced system, while simultaneously matching the latency of SRAM-first designs. Higher throughput and lower latency give you smarter and faster models for your subscription dollar. We’ve raised a $500M Series B to wrap up development and quickly scale manufacturing, with tapeout in under a year. The round was led by Jane Street, one of the most tech-savvy Wall Street firms, and Situational Awareness LP, whose founder @leopoldasch wrote the definitive memo on AGI. Participants include @sparkcapital, @danielgross and @natfriedman’s fund, @patrickc and @collision, @TriatomicCap, @HarpoonVentures, @karpathy, @dwarkesh_sp, and others. We’re also welcoming investors across the supply chain, including Marvell and Alchip. @MikeGunter_ and I started MatX because we felt that the best chip for LLMs should be designed from first principles with a deep understanding of what LLMs need and how they will evolve. We are willing to give up on small-model performance, low-volume workloads, and even ease of programming to deliver on such a chip. We’re now a 100-person team with people who think about everything from learning rate schedules, to Swing Modulo Scheduling, to guard/round/sticky bits, to blind-mated connections—all in the same building. If you’d like to help us architect, design, and deploy many generations of chips in large volume, consider joining us.

English

323

507

7.4K

2.5M

Vasanth Mohan@v_mohan_·24 Şub

@yontrtwt @SambaNovaAI @intel @Vista_Equity SN50 changes that esp for Agentic AI inference enabling lower cost tokens than Blackwell

English

100

yontr@yontrtwt·24 Şub

Before chatgpt came out when most AI was just CNNs I had really high hopes for sambanova because they were the only sram based companies which had a lot of DRAM in addition to SRAM. They were also the first company I heard use the term “dataflow compute”. However now we live in a different world and unfortunately sambanova did not pivot well. Now they are not the fast option or the cheap one.

English

795

Vasanth Mohan retweetledi

SambaNova@SambaNovaAI·24 Şub

SN50 is here, the fastest chip built for agentic AI. Max speed of up to 5X faster; run agentic AI at a 3X lower cost than GPUs, unlocking cloud-scale inference economics. We’ve also planned a multi-year strategic collaboration with @intel &raised $350M+ from @Vista_Equity, Cambium Capital & @TRowePrice to scale manufacturing &cloud capacity. Learn more: bit.ly/4qUsx9F

English

240

70.2K

Vasanth Mohan@v_mohan_·24 Şub

@EchoDrifter1145 Very important to scale to large models and deliver fast performance, cost efficiently!

English

犬養@EchoDrifter1145·24 Şub

3層構造のメモリアーキテクチャだそう。あんまり聴いたことないね。 "SambaNovaはSN50がSRAM、HBM、DDRの3層メモリを使用しており、「画期的なモデル容量」を提供し、10兆以上のパラメータと1,000万以上のコンテキスト長を持つモデルを実行可能にしていると述べました。" crn.com/news/component…

日本語

661

犬養@EchoDrifter1145·24 Şub

SambaNova SN50はソフトバンクのデータセンターに世界で一番最初に入るみたいね。 sambanova.ai/press/sambanov…

日本語

117

Vasanth Mohan retweetledi

Hugging Face@huggingface·14 Kas

This might be the biggest AI hackathon ever: * >6,300 registrants * Runs for 2 weeks (Nov. 14-30) * Open to anyone, anywhere virtually * $20,000 in cash prizes + $3.5M+ in sponsor credits Hosted by @Anthropic and @Gradio, along with 10 sponsors, join kickoff in 30 minutes 👇

Gradio@Gradio

Join us LIVE at MCP's first Birthday kickoff at 10 am PT today!🎂 Don't miss out on details about the celebration from the co-hosts, @Gradio and @AnthropicAI. 🔥 We've also got an exciting lineup of speakers from @Huggingface, @OpenAI, @GoogleDeepMind, @modal, @blaxelAI, @SambaNovaAI, and @nebiustf ready to share their insights.

English

488

80.2K

Vasanth Mohan retweetledi

SambaNova@SambaNovaAI·14 Kas

The AI industry faces a $5-8T build-out. The real bottleneck? Energy. Our focus: "Energy in, tokens out." @RodrigoLiang spoke with @mattmiller1973 & @daniburgz on @Bloomberg Open Interest about a new @StanfordAILab’s @HazyResearch, whose report shows our platform delivers 4x greater efficiency than Blackwell. The future is about smarter and more efficient chips. Read: sambanova.ai/blog/best-inte…

English

2.2K

Vasanth Mohan retweetledi

Jon Saad-Falcon@JonSaadFalcon·12 Kas

Data centers dominate AI, but they're hitting physical limits. What if the future of AI isn't just bigger data centers, but local intelligence in our hands? The viability of local AI depends on intelligence efficiency. To measure this, we propose intelligence per watt (IPW): intelligence delivered (capabilities) per unit of power consumed (efficiency). Today’s Local LMs already handle 88.7% of single-turn chat and reasoning queries, with local IPW improving 5.3× in 2 years—driven by better models (3.2×) and better accelerators (1.7×). As local IPW improves, a meaningful fraction of workloads can shift from centralized infrastructure to local compute, with IPW serving as the critical metric for tracking this transition. (1/N)

English

138

455

225.6K

Vasanth Mohan retweetledi

SambaNova@SambaNovaAI·12 Kas

Our RDU delivers 4X more intelligence per joule than Nvidia’s latest B200 Blackwell chip. Research from Stanford introduces "Intelligence per Joule", a new metric that best explains AI efficiency from chips to models. Learn more about this benchmark: sambanova.ai/blog/best-inte…

English

2.7K

Vasanth Mohan@v_mohan_·10 Eyl

@risphereeditor Let's go!

English

Risphere@risphereeditor·8 Eyl

SambaNova just added OpenAI's big open-source model OSS 120B. It runs at over 700 tokens per second, so around 400 words per second.

SambaNova@SambaNovaAI

🚨 New drop: @OpenAI-OSS 120B on SambaCloud at over 700 t/s ✅ US-built, Apache 2.0. Own, control, trust it ✅ Performance at low cost $0.22/$0.59 ✅ Deploy wherever you need & fine tune the model with your data ✅ Inference at over 700 t/s Build here: bit.ly/4nqGlYh

English

137

Vasanth Mohan@v_mohan_·26 Ağu

@hen0s1s @SambaNovaAI @deepseek_ai @ArtificialAnlys 32k

henosis@hen0s1s·26 Ağu

@SambaNovaAI @deepseek_ai @ArtificialAnlys Context length?

English

SambaNova@SambaNovaAI·26 Ağu

🐳 @deepseek_ai -V3.1 is on SambaCloud, & it's blazing fast at 169 tokens per second. Verified by @ArtificialAnlys. DeepSeek-V3.1 also shows improvements across various benchmarks compared to prior iterations of both the R1 & V3 models for thinking & non-thinking modes.

English

9.5K

Vasanth Mohan@v_mohan_·26 Ağu

@michabbb @SambaNovaAI Should be up soon on @openrouter

English

Micha(el) Bladowski 🇩🇪 🇺🇦@michabbb·26 Ağu

@SambaNovaAI openrouter - please.....

English

Vasanth Mohan@v_mohan_·26 Ağu

@aveer30 @SambaNovaAI @deepseek_ai Its live now!

English

aveer@aveer30·22 Ağu

@SambaNovaAI @deepseek_ai Are oo Samba!! Why no updates recently?

English

SambaNova@SambaNovaAI·21 Ağu

Love seeing innovations like @DeepSeek_ai pushing the boundaries of AI!

DeepSeek@deepseek_ai

Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀 🧠 Hybrid inference: Think & Non-Think — one model, two modes ⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528 🛠️ Stronger agent skills: Post-training boosts tool use and multi-step agent tasks Try it now — toggle Think/Non-Think via the "DeepThink" button: chat.deepseek.com 1/5

English

936

Vasanth Mohan retweetledi

Risphere@risphereeditor·25 Ağu

SambaNova is now hosting the best AI chat model based on Artificial Analysis (DeepSeek-V3.1) at 200 tokens (~150 words) per second. You can also test the provider on OpenRouter.

SambaNova@SambaNovaAI

🐋 New drop: @DeepSeek_ai V3.1 on SambaCloud @ 200+ t/s Beats Claude Opus 4 & earlier versions in coding Hybrid Thinking Mode = reasoning when needed, raw speed when not Open-source + deploy privately at lower cost

English

218

Vasanth Mohan@v_mohan_·25 Ağu

@elliotarledge @GroqInc Fast & Live on @SambaNovaAI

English

266

Elliot Arledge@elliotarledge·25 Ağu

@GroqInc so bad that you should replace it with deepseek v3.1

English

2.6K

Groq Inc@GroqInc·25 Ağu

Three weeks in, what’s your impression of gpt-oss so far?

English

229

47.4K

Vasanth Mohan retweetledi

SambaNova@SambaNovaAI·25 Ağu

English

131.5K

Vasanth Mohan@v_mohan_·24 Ağu

@ClementDelangue @sundeep @WhiteHouse @DavidSacks @mkratsios47 @sriramk @deanwball Glad to also here Grok 3 is coming in a few months

English

1.8K

clem 🤗@ClementDelangue·24 Ağu

Great to see 🇺🇸 stepping up on open weights since the @WhiteHouse AI action plan! Well done @DavidSacks @mkratsios47 @sriramk @deanwball!

Elon Musk@elonmusk

The @xAI Grok 2.5 model, which was our best model last year, is now open source. Grok 3 will be made open source in about 6 months. huggingface.co/xai-org/grok-2

English

221

174.2K

Vasanth Mohan@v_mohan_·22 Ağu

@dkundel Wish I could make it!

English

dominik kundel@dkundel·22 Ağu

Join us!! 🙌🙌

ollama@ollama

Join gpt-oss meetup in San Francisco next Wednesday at 5:30pm together with @OpenAI, @ollama, and @vllm_project at @ycombinator office! RSVP required! lu.ma/gpt-oss Food and drinks will be provided. lu.ma/gpt-oss

English

1.7K

Vasanth Mohan retweetledi

SambaNova@SambaNovaAI·12 Ağu

Watch @hume_ai + GPT-5 on SambaNova Cloud bring characters (and fun chaos) to life in real time. 🎙️Realistic voices + powerful reasoning = stories you didn’t see coming.

English

22.2K

Vasanth Mohan retweetledi

𝐷𝑟. 𝐼𝑎𝑛 𝐶𝑢𝑡𝑟𝑒𝑠𝑠@IanCutress·8 Tem

Here at #raise25 - looks like @SambaNovaAI had a rebrand, announced today.

English

5.2K

Keşfet

@JonSaadFalcon @Avanika15 @HazyResearch @Azaliamirh @karpathy @SambaNovaAI @yontrtwt @intel