Richard Kuzma

235 posts

Richard Kuzma

@rskuzma

Helping AI agents talk to data @google, ex-LLMs https://t.co/U0wNJ4SFex, AI for national security @USSOCOM, Tech for Public Good @DIU_x and Harvard @Kennedy_School

NYC Katılım Temmuz 2017

2.1K Takip Edilen457 Takipçiler

Sabitlenmiş Tweet

Richard Kuzma@rskuzma·28 Mar

LLMs aren't just for GPUs! @cerebras releases a family of models up to 13B parameters on @huggingface to promote open research into scaling laws and demonstrate the capability of CS-2 hardware Why do this? 1/4

English

175

30.8K

Richard Kuzma retweetledi

Peter Reinhardt@reinpk·20 Kas

My lobbyists are very nervous about me posting this, but over-regulation is working against us all. The costs are astronomical to us all, but hidden. So, I'm taking a risk, and sharing my stories from Charm and Revoy: rein.pk/over-regulatio…

English

184

1.4K

735.8K

Richard Kuzma@rskuzma·25 Eki

Awesome work on MoEs by Daria!

Daria Soboleva@dmsobol

Apologies for the delay on getting MoE 101's last episode out! Originally I planned to cover inference arithmetics only, but we turned it into the MoE inference 101! I know you enjoyed the MoE training perf modeling, this one is for inference. On both gpus and cerebras. cerebras.ai/blog/moe-guide… 🧵1/n

English

171

Richard Kuzma retweetledi

Daria Soboleva@dmsobol·17 Eyl

This might be the most information dense blog I've ever written. Added "show me the math" section into MoE 101 p4 episode. We believe it fully models MoE training perf on both gpu and cerebras wse devices. cerebras.ai/blog/moe-guide… 🧵1/n

Cerebras@cerebras

🧮 Calling all Mathletes, this one is for you. We’ve been asked to show the math behind our MoE claims. So we did. Our analysis confirms: On GPUs, expert parallelism creates severe communication overheads that dwarf computation and make MoE training painfully slow. At Cerebras, we avoid model parallelism entirely, but sparsity subdivides batches and leaves experts I/O bound. With BTA, we fix it. By decoupling batch size requirements across experts and attention layers, we remove the bottleneck. @dmsobol breaks it down.

English

253

39.3K

Richard Kuzma@rskuzma·17 Tem

@LinkofSunshine San Diego residents must be paying for this ranking to push down housing costs

English

Basil🧡@LinkofSunshine·16 Tem

City tiers (objective) 1) NYC 2) Chicago, DC 3) SF, Seattle, Philly 4) Pittsburgh, LA, Boston, Portland, Denver, Baltimore 5) Minneapolis, Miami, Detroit, San Diego, Cincinnati, Austin 6) Dallas, Houston, Atlanta, Phoenix 7) Other cities

English

603

112

5.4K

674.4K

Richard Kuzma@rskuzma·14 Tem

Link: cloud.google.com/blog/products/…

English

Richard Kuzma@rskuzma·14 Tem

Enterprise companies are hoping GenAI agents change their business. But security and IP is a non-negotiable. Wrote a blog on how @googlecloud ensures security while bringing “chat with data” agents to to customers

English

Richard Kuzma@rskuzma·7 Kas

@swyx @NeurIPSConf @latentspacepod @arxiv @vibhuuuus @draecomino this feels like a fit for you?

English

swyx@swyx·7 Kas

Going to @NeurIPSConf? the @latentspacepod crew is looking for partners to do unofficial side events. anything from dinners to fullblown @arxiv preprint presos that didn't make the cut but should have. Please hit me and @vibhuuuus up with ideas and see you in Vancouver Tuesday-Sunday!

English

6.5K

Richard Kuzma retweetledi

Lydia Hylton@lyd_hylton·15 Eki

Thrilled to officially announce what I've been working on for the last year: Strella.io! At Strella, we believe that the customer’s needs should be a company’s North Star. Using Strella’s AI, we enable companies to make informed decisions in hours, not weeks ⭐️🌟🚀

Strella@strella_io

Today, we're excited to introduce Strella and announce $4M in seed funding led by Decibel Ventures with participation from Unusual Ventures to transform the future of customer research! 🚀 Read more about our story and the journey ahead here: hubs.la/Q02TqVlb0

English

6.4K

Richard Kuzma retweetledi

Ted Mabrey@MabreyTed·12 Eyl

Man so excited we could finally unveil this. This is THE applied AI project. Google walked away from it. We embraced it. The world is a different place because of it. It provides so many foundational learnings that we are now applying to the commercial world via AIP. The origin story of Silicon Valley innovation reincarnated.

Palantir@PalantirTech

We were honored to welcome Vice Admiral Frank Whitworth to the stage at AIPCon to discuss Maven Smart System, and the role of Palantir in @NGA_GEOINT's critical mission. Watch his full demonstration.

English

151

12.9K

Richard Kuzma@rskuzma·28 Ağu

Crazy speed from the team at @cerebras! Unlocks lots of interesting use cases across fast agent tool calling, multi-agent systems, self-consistency, and more!

Cerebras@cerebras

Verified by @ArtificialAnlys, Cerebras Inference achieves 1,850 tokens/sec on Llama 3.1 8B and 450 tokens/sec on Llama 3.1 70B! By dramatically reducing processing time, we're enabling more complex AI workflows and enhancing real-time LLM intelligence. This includes a new class of intelligent agents that can “think faster” than ever before. Cerebras Inference will power a new era of Instant AI. 👉Try it today: inference.cerebras.ai 👉Read our blog: cerebras.ai/blog/introduci… 👉Check out Artificial Analysis for more data: artificialanalysis.ai/providers/cere…

English

228

Richard Kuzma retweetledi

Ritwik Gupta 🇺🇦@Ritwik_G·5 Haz

I read @leopoldasch's essay on the future of AI research and geopolitical competition. It's well-researched, well-presented, and passionate. However, Leopold advocates for an unreasonably strict and exclusionary future for AI development—a view that's gaining traction. (1/9)

English

36.8K

Richard Kuzma@rskuzma·26 May

This sounds like the start of a wedding DJ empire

a local@jedwill

The DJ just played peanut butter jelly time and handed out uncrustables man

English

130

Richard Kuzma@rskuzma·30 Nis

@AlonBochman But users do not bear the training cost? And storing + serving a model that size (with many experts) on GPUs would be a huge pain?

English

Alon Bochman@AlonBochman·29 Nis

Building an enterprise LLM app? You are probably exploring the wrong LLM. Llama3 is getting a lot of press, but Snowflake’s Arctic model offers similar performance even though it was 17x cheaper to train (about $2M or 3K GPU weeks). Also compare this with DBRX, which cost ~$10M.

English

192

Richard Kuzma@rskuzma·17 Ara

@BWRadford Give the people the Spotify wrapped for this

English

Richard Kuzma retweetledi

Greg Brockman@gdb·9 Ara

evals are surprisingly often all you need

English

1.4K

363.3K

Richard Kuzma@rskuzma·18 Ağu

@pstAsiatech @jacobhelberg @Meta @DeptofDefense Military uses are expressly not permitted in the llama v2 license

English

800

Paul Triolo@pstAsiatech·17 Ağu

@jacobhelberg @Meta @DeptofDefense Where is the DoD issue with LLama2 spelled out?

English

1.2K

Jacob Helberg@jacobhelberg·17 Ağu

Very disheartening that @Meta basically made Llama2 (its LLM competitor to ChatGPT) open for use to everyone except the @DeptofDefense. The CCP will effectively be able to use Llama2, but not the DoD. Lots of questions for Congress to ask.

English

770

225.2K

Richard Kuzma@rskuzma·16 Ağu

🥳 CerebrasGPT proved to the world in March how effectively you can train LLMs on @cerebras hardware. Now BTLM surpasses 1M downloads in ~3 weeks on @huggingface! 🚀

Cerebras@cerebras

Cerebras BTLM-3B-8K model crosses 1M downloads🤯 It's the #1 ranked 3B language model on @huggingface! A big thanks to all the devs out there building on top of open source models 🙌

English

283

Richard Kuzma retweetledi

Cerebras@cerebras·27 Tem

The Cerebras team has had a great time sharing our work at #ICML23. Below is a summary of the posters we presented, let us know if you are interested in discussing any of them further!

English

2.7K

Richard Kuzma@rskuzma·25 Tem

@Suhail @jessejohnsohn Unless you use non-GPU hardware to train it…

English

553

Suhail@Suhail·25 Tem

@jessejohnsohn In about 2 mo or less, you won't be able to get a more than 128 interconnected A100/H100 GPUs without waiting 6 mo. Which means you can't train a SOTA foundation model.

English

230

59.3K

Suhail@Suhail·25 Tem

There's a full blown run on GPU compute on a level I think people do not fully comprehend right now. Holy cow.

English

130

194

2.4K

1.4M

Richard Kuzma retweetledi

Openτensor Foundaτion@opentensor·24 Tem

The Opentensor Foundation and Cerebras are pleased to announce Bittensor Language Model (BTLM), a new state-of-the-art 3 billion parameter language model that achieves breakthrough accuracy across a dozen AI benchmarks

English

220

638

218.9K

Keşfet

@LinkofSunshine @googlecloud @swyx @NeurIPSConf @latentspacepod @arxiv @vibhuuuus @draecomino