Adam Prout

382 posts

Adam Prout

Adam Prout

@a_prout

Distinguished Engineer @ Microsoft. Ex-MemSQL/SingleStoreDB cofounder. Database builder. One trick pony.

Katılım Haziran 2012
348 Takip Edilen1K Takipçiler
Adam Prout
Adam Prout@a_prout·
@JasonThorsness Yep, we are throwing agents indiscriminately at every bug filed right now. We're starting to indiscriminately try to have them find bugs more directly...
English
1
0
0
23
Jason Thorsness
Jason Thorsness@JasonThorsness·
@a_prout At least with the token spigot turned to max here at work, all flaky tests and edge cases are getting a lot more attention from me than I used to be capable of 😊
English
1
0
1
42
Adam Prout
Adam Prout@a_prout·
Broadly agree that there is some pressure to go faster via agents and hope your tests are good enough to catch problems. That said, I still have hope that the overall quality of software will go up with agents debugging, testing and fixing bugs. They are good at this.
Mitchell Hashimoto@mitchellh

I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out. I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really). It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely. The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture. We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying. I worry.

English
1
0
4
869
Adam Prout
Adam Prout@a_prout·
This approach gives you more CPU to run your workload. I talked about this in more detail recently in Andy's CMU tech talk series. HorizonDB uses Azure blob store as its durable storage (same design as the Socrates paper) youtu.be/EdgeqeW47_w?si… via @YouTube
YouTube video
YouTube
English
0
0
7
563
Adam Prout
Adam Prout@a_prout·
Agreed, shared storage (Aurora, AlloyDB, HorizonDB, Neon, etc.) is the dominate design for cloud OTLP. It's not the best in every scenario, but it is for most use cases. You can push a lot of work (replication, full page writes, dirty page writes, etc.) into the storage layer
Nikita | Scaling Postgres@nikitabase

There is a new era of data tech that is effectively "__ on object storage": Turbopuffer is "vector search on object storage" Warpstream is "kafka on object storage" Neon is "Postgres on Object Storage" or “Postgres on S3”. It doesn’t mean that every read and write goes directly to S3. That would be incredibly slow. I’m saying that a Postgres database in the "Postgres on object storage" category can be faster than one in the "Postgres on a cluster of servers with NVMe disks" category. No one is claiming that S3 is faster than NVMe but Postgres on S3 (with low latency storage in between) can be faster than Postgres running on NVMe with HA on. HA is important here, without HA you don’t do durable writes so it would be an unfair comparison. While neon runs on s3, calls into s3 are almost never on the transaction reads or writes. Writes are sent into a consensus service and streamed into s3 asynchronously. So the claim can be expanded to Postgres running on a disaggregated storage which implements low latency tier on top of s3 is faster then Postgres with HA running on NVMe. We are not the only ones making this claim. For example AWS Aurora says "Aurora has 5x the throughput of MySQL and 3x of PostgreSQL with full PostgreSQL and MySQL compatibility." So why does disaggregating compute allow for higher throughput on Postgres and potentially lower latency as well? The reason is that we can offload a number of CPU and IO operations down to storage. We just published a blog post on how we can turn off full page writes which dramatically reduces WAL volume and saves on CPU cycles on the Postgres node. In many scenarios this may be a wash because for many workloads you might not be write throughout bound and therefore Postgres checkpoints and full page writes don’t impact overall throughput. However this is general purpose enough to impact a large swath of workloads. It’s also important to mention that scaling write throughput is more important since Postgres is a single write system and you can’t scale writes with read replicas. So is Postgres on S3 faster than Postgres on NVMe? We believe it can and will be. Postgres with disaggregated storage and several kernel performance optimizations has higher throughput than stock Postgres on NVMe with HA implemented via sync replication. We'll share more including latency impact as we gather insights after rollout. Lots of things to learn here if saving CPU on full page writes can have a material impact on latency under high throughput. The idea is that if CPU is all used up, freeing up some CPU will impact both latency and throughput - but we'll see! The statement is indeed provocative, but far from “shock value marketing” as some of the responses claim.

English
2
1
34
6.2K
Adam Prout
Adam Prout@a_prout·
If I had known database twitter was going to blow up about NVMe vs blob store for OLTP today, we would have hit on that in a little more detail! Still, if you're interested in a career working on databases hopefully this podcast will be of some interest to you.
Claire Giordano ✨@clairegiordano

🎙️ New #TalkingPostgres podcast Ep39 is out! @a_prout, distinguished engineer at Microsoft, talked about his engineering journey from MemSQL to HorizonDB, shared-storage, & why good systems programmers are paranoid 🎧 talkingpostgres.com/episodes/from-… 📺 youtu.be/L_2kyfL9LN0?si…

English
1
2
30
3K
Adam Prout
Adam Prout@a_prout·
@saisrirampur @kellabyte I'm talking specifically about the setup you benchmarked (No Postgres HA when running on local disks). Using HA on local NVMe disks is just fine. You did not benchmark that.
English
1
0
0
113
Sai Srirampur
Sai Srirampur@saisrirampur·
True, fair point. However replication round trips are a common denominator across most systems in the benchmark, and we expect performance to normalize whether with HA enabled. Even in that setup, you can argue that EBS offers stronger durability guarantees than local NVMe. But realistically, local NVMe setups in production have enough safeguards - HA across AZ with upto 2 standbys and continuous backups/WAL-archival to S3. This isn’t some niche or unproven architecture either, companies like Datadog and Instacart are operating successfully at massive scale with local NVMe–backed Postgres deployments for years. So sure, if the argument is purely about theoretical storage-layer durability semantics, EBS wins. But in real-world local NVMe setups with proper safeguards have proven to be reliable and performant, at enterprise grade.
English
1
0
0
139
Kelly Sommers
Kelly Sommers@kellabyte·
More and more I feel motivated to start doing independent benchmarks.
English
12
2
64
14K
Adam Prout
Adam Prout@a_prout·
@iavins Strong +1 on including durability guarantees. The folks doing single node Postgres on NVMe benchmarks with no HA and comparing that to systems running on remote disks with 5 9's or more durability need to be much clearer on the trade-offs (100x higher chance of data loss)
English
0
0
12
806
v
v@iavins·
I hate all the discussions around NVMe vs S3, because they miss all the nuances. The extremely obvious thing: the read or write latency from NVMe is going to be way smaller than S3 (or S3 Express). But... most discussions talk about database running in a single machine attached to NVMe. That comparison isn't fair, and here's why: When you write to S3, it is replicated to multiple machines. But your data is poof if your NVMe is gone. S3 Standard writes to multiple AZs. S3 Express One Zone writes to multiple machines in a single AZ. Only after the data is secured and durable is it acknowledged as successful. You can't get the same durability guarantees with NVMe attached to a single machine. If you implement your own quorum (within an AZ) to do the same, it will be around 2–3ms. I implemented a disaggregated storage engine for libSQL using FoundationDB. IIRC, the latency numbers were around 5–6ms with a 3 node FDB cluster in the same AZ. This was an out of the box FDB setup, without any config changes or tuning. I am really handwaving the numbers here, but the point is, when you write to a quorum the latency gonna be way higher than NVMe. The truth is somewhere in between, this is where DB vendors are fooling you: NVMe isn't as fast as they claim, but still faster than S3. So yeah, if you are comparing NVMe vs S3, include durability considerations. I'd love to see such benchmarks.
English
9
8
182
15.7K
Adam Prout
Adam Prout@a_prout·
@saisrirampur @kellabyte A remote disk (EBS, Aurora storage, neon storage) will be 5 9's durable at least, a local disk will be 2 9s.
English
0
0
0
48
Adam Prout
Adam Prout@a_prout·
@saisrirampur @kellabyte Yes, you didn't hide it, your just not upfront about the harsh trade-offs with data loss risk in your setup. There is a big difference in durability guarantees between the local disks your using vs the remote disks used by everyone your comparing against (EBS, etc.).
English
2
0
0
132
Adam Prout
Adam Prout@a_prout·
@saisrirampur @kellabyte hmm.. this from the folks at Clickhouse who published a Postgres benchmark that runs vs locally attached disks with no high availability. That is far more egregious than this Lakebase article. Database customers typically care about not losing data...
English
1
0
1
626
Sai Srirampur
Sai Srirampur@saisrirampur·
@kellabyte Totally makes sense. Telling that story in a real, enlightening and transparent way matters.
English
1
0
8
19.3K
Adam Prout
Adam Prout@a_prout·
If you're interested in: - how databases (and database services) are built - how building a database service at a startup compares to doing it at big tech - how Postgres vs MySQL vs SQL Server are different/same (I've work on/around all three!) come check this out!
Claire Giordano ✨@clairegiordano

Looking forward to talking to database architect Adam Prout @a_prout today/Wed 6 May at 10am PDT on the #TalkingPostgres podcast! Ep39 topic: From MemSQL to HorizonDB, an engineer's journey Where? Live on the Microsoft Open Source Discord. Join us: aka.ms/talkingpostgre…

English
0
0
4
406
Adam Prout
Adam Prout@a_prout·
Agents are great testers and bug fixers if focused on this work. For code bases with a high quality bar (databases) and massive test suites, agents root causing bugs and proposing fixes is a big time saver. I hope they keep improving as testers! @adamprout/agents-are-better-testers-than-we-are-30b1738114d6" target="_blank" rel="nofollow noopener">medium.com/@adamprout/age…
English
1
2
22
5.6K
Adam Prout retweetledi
Glauber Costa
Glauber Costa@glcst·
It is personal now. I have a new archenemy. I was having a great time at @AntithesisHQ's conference (the lineup for this year is fantastic), but then @carlsverre ruined my day. I hate that guy now. How? he told me about Hegel, and then I ended up spending the whole day fixing stuff. Carl might have saved me some 3 months of work. But he ruined my day. Can I ever forgive him? Read more 👇
English
4
3
94
9.3K
Adam Prout
Adam Prout@a_prout·
@iavins LSMs typically have a WAL log to make the in-memory layer durable don't they? It's a simpler log vs a B-Tree, but still around. I haven't done a broad study of LSMs, but MemSQL/Singlestore did things this way and so does RocksDB
English
2
0
5
223
v
v@iavins·
Most B-tree databases have a WAL (unless it's a CoW B-tree) But B-tree vs LSM comparisons online almost never factor it in
English
4
1
41
6.1K
Adam Prout
Adam Prout@a_prout·
@wegrydnn Andy usually puts them online within a few days.
English
0
0
0
207
Adam Prout
Adam Prout@a_prout·
Representing team Postgres... I'll talk about some of the changes we've made to Azure (and to Postgres) to improve the performance/reliability/security of running PostgreSQL in the cloud.
CMU Database Group@CMUDB

Today's Postgres vs. World Seminar Speaker: Adam Prout (@a_prout) will present the architecture of the newly released Microsoft Azure HorizonDB. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/pg-vs-w…

English
3
7
65
17.1K
Adam Prout
Adam Prout@a_prout·
@eatonphil It's a good for the MySQL community. That said they're a VC backed startup that will (eventually) be under some pressure to deliver VC-level returns ($$ Billions with a B). I'm not sure how that will work out...
English
0
0
0
31
Phil Eaton
Phil Eaton@eatonphil·
New database (fork) dropped. Great to see paths forward for MySQL as a community like this.
Phil Eaton tweet media
English
4
0
29
2.6K
Adam Prout
Adam Prout@a_prout·
@andy_pavlo "Databases year in review" is a great read to start off the year for folks working on (or using) databases. Good to see the Azure HorizonDB launch made the cut!
Andy Pavlo (@andypavlo.bsky.social)@andy_pavlo

Here is my latest article on the world of databases: cs.cmu.edu/~pavlo/blog/20… All the hot topics from the last year: • More Postgres action! • MCP for everyone! • MongoDB gets litigious with FerretDB! • File formats! • Market movements! • The richest person in the world!

English
0
0
0
67