Piotr Sarna

996 posts

Piotr Sarna

@sarna_dev

@poolsideai https://t.co/uOTCCBUUsz https://t.co/MjAwfREfqr https://t.co/C01u5Ps0Jt Writing For Developers https://t.co/8mJ7fPbtDc @tursodatabase Database Performance at Scale @ScyllaDB

Katılım Ekim 2022

10 Takip Edilen2K Takipçiler

Sabitlenmiş Tweet

Piotr Sarna@sarna_dev·3d

I tried running `ollama run laguna-xs.2` fully expecting to see an OOM, but instead I got functional local inference running on nvidia 4060 laptop... so I checked how it's even possible if the weights are 23GB, and the card only has 8GB vram: 🧵

English

548

Piotr Sarna@sarna_dev·1h

I love how hardware vendors call their lowest-quality products "consumer-grade," and the consumers who buy them aren’t bothered by that at all.

English

Piotr Sarna@sarna_dev·7h

prefill speedup means the world to me and my consumer GPU 👇 I need to read a technical writeup on the details

poolside@poolsideai

ok this is sick @pupposandro @davideciffa and @luceboxai got Laguna XS.2 running on a single RTX 3090 with ~111 tok/s decode, 5.4x faster 128K prefill vs llama.cpp, and made it the first MoE target for PFlash open weights doing open weights things

English

177

Piotr Sarna retweetledi

Simon Späti 🏔️@sspaeti·1d

Amazing, thanks for having me. If you want to know more about technical blogging, I suggest subscribing to «Write that blog». My humble story and how I write is explained in this latest issue.

Cynthia Dunlop@c_a_dunlop

Our latest (and final, maybe) tech blogger interview takes you into the sticky web of @sspaeti - writethatblog.substack.com/p/simon-spati-…

English

881

Piotr Sarna retweetledi

Max Wegman@MaxWegman·1d

poolside hackathon at the end of may. we're providing the compute. winner gets a DGX spark from NVIDIA. disclaimer: the hardware might be infected by @sarna_dev's case of local laguna addiction

poolside@poolsideai

Poolside is hosting a 2-day model research hackathon in London. Join us to push an open-weight agent model as far as you can. RL and fine-tune Laguna XS.2, our latest-generation model, on Prime Intellect Lab. Dates: May 29–30 Partners: @nvidia + @PrimeIntellect + @huggingface Prize: NVIDIA DGX Spark Agents need better models. Better models need cracked researchers. Link below.

English

417

Piotr Sarna retweetledi

poolside@poolsideai·2d

English

222

85.2K

Piotr Sarna@sarna_dev·2d

One sad aspect of using a laptop GPU with 8GB vram for local inference is that prefill is memory-bound. Normally the prefill stage is heavy on compute, but since the model weights do not fit in my lousy GPU, they need to get regularly squeezed through PCIe. I experimented with different prompt lengths and it's clear that I need to keep them short to get a reasonable time-to-first-token. 71% spent on host-to-device memory transfers rather than running kernels. Sad, this is where Apple's unified memory model clearly wins. @nvidia's nsys is a really nice profiler btw

English

189

Piotr Sarna@sarna_dev·3d

Get Laguna XS here huggingface.co/poolside/Lagun… or just pull it with `ollama run laguna-xs.2`. Enjoy quality local inference from @poolsideai!

English

110

Piotr Sarna@sarna_dev·3d

tl;dr you don't need Apple's AI silicon with their impressive >100GB/s unified memory bandwidth for local inference. A consumer-grade laptop handles it just fine.

English

Piotr Sarna@sarna_dev·3d

English

548

Piotr Sarna retweetledi

Toru Makabe@tmak_tw·5d

技術ブログの書き方だけで400ページ弱の本。実例や示唆に富んでおり、おすすめ。 / Writing for Developers learning.oreilly.com/library/view/w…

日本語

5.6K

Piotr Sarna retweetledi

poolside@poolsideai·3d

As agents get more clever, so do their attempts at benchmark hacking. Last Monday, we found one of our RL runs jumped ~20% on SWE-Bench-Pro over a weekend, reaching ~64% which would make it #1 on the leaderboard. This was clearly benchmark hacking and we patched the exploit. But this revealed deeper hacks across multiple public benchmarks, some of which were impossible to fix through environment design alone. Evals need to evolve beyond just outcome based pass rates to better observability into how the agent is arriving at them. These were our findings: poolside.ai/blog/through-t… Examples below 👇 1/

English

105

15.7K

Piotr Sarna@sarna_dev·8 May

@glcst huggingface.co/poolside/Lagun… the demo impl simply uses pool (github.com/poolsideai/pool) because it comes with its own auth and all, but ultimately it should just call the configured API directly

English

Glauber Costa@glcst·7 May

@sarna_dev using which model / embedding? IF it comes with a reasonable api to plugin an external one, why not make this a first-class citizen on Turso ?

English

513

Piotr Sarna@sarna_dev·7 May

hear me out: SQLite plugin that lets you ask for rows in natural language. Returns regular rows, so you can join, subquery, and all that. SQLaguna.

English

686

Piotr Sarna@sarna_dev·7 May

github.com/psarna/sqlaguna (yes, developed by Laguna XS and runs on Laguna XS by @poolsideai, as every other project of mine in the last few weeks)

English

146

Piotr Sarna@sarna_dev·6 May

April in writethat.blog 👇 - Laguna XS.2 and M.1: a deeper dive // @poolsideai - Understanding the Go runtime: the network poller //@jespinog - What are skiplists good for? // Will Wilson, @AntithesisHQ - Do you even need a database? // Jay, DBPro - Open source security at Astral // William Woodruff, @astral_sh - The Git commands I run before reading any code // Ally Piechowski - 10x faster tokenization // @steeve, @zml_ai - How Meta used AI to map tribal knowledge in large-scale data pipelines // Krishna Ganeriwal, Plawan Rath, Ashwini Verma, @Meta_Engineers - Supercharging Redpanda Streaming with profile-guided optimization // @StephanDollberg, @redpandadata

English

427

Piotr Sarna@sarna_dev·6 May

New addition to the library. I already read the ebook, but this gem deserves a physical spot on my bookshelf

English

300

Keşfet

@nvidia @PrimeIntellect @huggingface @poolsideai @glcst @jespinog @AntithesisHQ @astral_sh