Piotr Sarna

996 posts

Piotr Sarna banner
Piotr Sarna

Piotr Sarna

@sarna_dev

@poolsideai https://t.co/uOTCCBUUsz https://t.co/MjAwfREfqr https://t.co/C01u5Ps0Jt Writing For Developers https://t.co/8mJ7fPbtDc @tursodatabase Database Performance at Scale @ScyllaDB

Katılım Ekim 2022
10 Takip Edilen2K Takipçiler
Sabitlenmiş Tweet
Piotr Sarna
Piotr Sarna@sarna_dev·
I tried running `ollama run laguna-xs.2` fully expecting to see an OOM, but instead I got functional local inference running on nvidia 4060 laptop... so I checked how it's even possible if the weights are 23GB, and the card only has 8GB vram: 🧵
English
1
0
5
548
Piotr Sarna
Piotr Sarna@sarna_dev·
I love how hardware vendors call their lowest-quality products "consumer-grade," and the consumers who buy them aren’t bothered by that at all.
English
0
0
0
54
Piotr Sarna
Piotr Sarna@sarna_dev·
prefill speedup means the world to me and my consumer GPU 👇 I need to read a technical writeup on the details
poolside@poolsideai

ok this is sick @pupposandro @davideciffa and @luceboxai got Laguna XS.2 running on a single RTX 3090 with ~111 tok/s decode, 5.4x faster 128K prefill vs llama.cpp, and made it the first MoE target for PFlash open weights doing open weights things

English
0
0
3
177
Piotr Sarna retweetledi
Max Wegman
Max Wegman@MaxWegman·
poolside hackathon at the end of may. we're providing the compute. winner gets a DGX spark from NVIDIA. disclaimer: the hardware might be infected by @sarna_dev's case of local laguna addiction
poolside@poolsideai

Poolside is hosting a 2-day model research hackathon in London. Join us to push an open-weight agent model as far as you can. RL and fine-tune Laguna XS.2, our latest-generation model, on Prime Intellect Lab. Dates: May 29–30 Partners: @nvidia + @PrimeIntellect + @huggingface Prize: NVIDIA DGX Spark Agents need better models. Better models need cracked researchers. Link below.

English
0
1
8
417
Piotr Sarna retweetledi
poolside
poolside@poolsideai·
Poolside is hosting a 2-day model research hackathon in London. Join us to push an open-weight agent model as far as you can. RL and fine-tune Laguna XS.2, our latest-generation model, on Prime Intellect Lab. Dates: May 29–30 Partners: @nvidia + @PrimeIntellect + @huggingface Prize: NVIDIA DGX Spark Agents need better models. Better models need cracked researchers. Link below.
English
25
45
222
85.2K
Piotr Sarna
Piotr Sarna@sarna_dev·
One sad aspect of using a laptop GPU with 8GB vram for local inference is that prefill is memory-bound. Normally the prefill stage is heavy on compute, but since the model weights do not fit in my lousy GPU, they need to get regularly squeezed through PCIe. I experimented with different prompt lengths and it's clear that I need to keep them short to get a reasonable time-to-first-token. 71% spent on host-to-device memory transfers rather than running kernels. Sad, this is where Apple's unified memory model clearly wins. @nvidia's nsys is a really nice profiler btw
Piotr Sarna tweet media
English
0
0
3
189
Piotr Sarna
Piotr Sarna@sarna_dev·
tl;dr you don't need Apple's AI silicon with their impressive >100GB/s unified memory bandwidth for local inference. A consumer-grade laptop handles it just fine.
English
1
0
0
71
Piotr Sarna
Piotr Sarna@sarna_dev·
I tried running `ollama run laguna-xs.2` fully expecting to see an OOM, but instead I got functional local inference running on nvidia 4060 laptop... so I checked how it's even possible if the weights are 23GB, and the card only has 8GB vram: 🧵
English
1
0
5
548
Piotr Sarna retweetledi
poolside
poolside@poolsideai·
As agents get more clever, so do their attempts at benchmark hacking. Last Monday, we found one of our RL runs jumped ~20% on SWE-Bench-Pro over a weekend, reaching ~64% which would make it #1 on the leaderboard. This was clearly benchmark hacking and we patched the exploit. But this revealed deeper hacks across multiple public benchmarks, some of which were impossible to fix through environment design alone. Evals need to evolve beyond just outcome based pass rates to better observability into how the agent is arriving at them. These were our findings: poolside.ai/blog/through-t… Examples below 👇 1/
poolside tweet media
English
8
22
105
15.7K
Glauber Costa
Glauber Costa@glcst·
@sarna_dev using which model / embedding? IF it comes with a reasonable api to plugin an external one, why not make this a first-class citizen on Turso ?
English
1
0
1
513
Piotr Sarna
Piotr Sarna@sarna_dev·
hear me out: SQLite plugin that lets you ask for rows in natural language. Returns regular rows, so you can join, subquery, and all that. SQLaguna.
Piotr Sarna tweet media
English
2
0
2
686
Piotr Sarna
Piotr Sarna@sarna_dev·
April in writethat.blog 👇 - Laguna XS.2 and M.1: a deeper dive // @poolsideai - Understanding the Go runtime: the network poller //@jespinog - What are skiplists good for? // Will Wilson, @AntithesisHQ - Do you even need a database? // Jay, DBPro - Open source security at Astral // William Woodruff, @astral_sh - The Git commands I run before reading any code // Ally Piechowski - 10x faster tokenization // @steeve, @zml_ai - How Meta used AI to map tribal knowledge in large-scale data pipelines // Krishna Ganeriwal, Plawan Rath, Ashwini Verma, @Meta_Engineers - Supercharging Redpanda Streaming with profile-guided optimization // @StephanDollberg, @redpandadata
Piotr Sarna tweet media
English
0
3
8
427
Piotr Sarna
Piotr Sarna@sarna_dev·
New addition to the library. I already read the ebook, but this gem deserves a physical spot on my bookshelf
Piotr Sarna tweet media
English
0
0
8
300