Domas Mituzas 🤡

1.6K posts

Domas Mituzas 🤡

@mituzas

small data artisan. distinguished weaponized adhd at meta, survived hypergrowth at facebook(2009+)/wikipedia(2004-2011), dining fakefluencer on ig domas.eats

Menlo Park, CA Katılım Ağustos 2009

192 Takip Edilen1.7K Takipçiler

Domas Mituzas 🤡@mituzas·11h

@samlambert Thats about 1TB more memory for that SSD capacity than I’m used to. And I thought you were all about IOPS.

English

1.1K

Sam Lambert@samlambert·13h

we have a new Metal SKU (yes people use them)

English

393

62.5K

Domas Mituzas 🤡@mituzas·2d

@tszzl Plenty of stars with no CO2 planets around!

English

214

roon@tszzl·3d

(hail mary spoilers so please skip if you care) isn’t the situation by the end that every single star in galaxy dies (or dims enough to kill all planetary life) except the two that are saved and the third where the taomeba originate

English

260

54.5K

Domas Mituzas 🤡@mituzas·3d

Of course AI uses lots of sea water - you need to extract deuterium for fusion reactors, etc - the reason to colonize other planets that have oceans on them.

English

131

Domas Mituzas 🤡@mituzas·16 Mar

@jamesacowling Hate them every day!

English

107

James Cowling@jamesacowling·16 Mar

If you don't hate databases you don't love databases

English

125

5.1K

Domas Mituzas 🤡@mituzas·15 Mar

@shrydar @neogoose_btw Yes, ET: eternalterminal.dev

English

Christopher (shrydar.bsky.social)@shrydar·15 Mar

@neogoose_btw Wait, is there a way to persist sessions that survives laptop sleeps and VPN timeouts that *doesn't* involve tmux? (I spent enough hours attempting to get the latter to behave itself that I've drawn a line under that one, sorry tmux fans)

English

2.8K

Dmitriy Kovalenko@neogoose_btw·15 Mar

I can’t believe some people seriously running ssh for working full day without keep-alive wrapper and loosing connection every time you go to the toilet And some of you using this as a reason to ruin your terminal experience by using tmux 😭

English

142

27.1K

Domas Mituzas 🤡@mituzas·14 Mar

@brankopetric00 Cron jobs should run in the middle of the work day, so that engineers can respond to issues, if needed.

English

163

Branko@brankopetric00·14 Mar

Cron job was supposed to run at 2 AM. Server had wrong timezone. It ran at 8 AM instead. During peak traffic. It processed 6 million records.

English

1.6K

85.6K

Domas Mituzas 🤡@mituzas·10 Mar

@CNN It was supposed to be a lighthearted post, but it took a dark turn.

English

CNN@CNN·10 Mar

A post regarding the two individuals arrested for throwing homemade bombs outside of New York City Mayor Zohran Mamdani’s home failed to reflect the gravity of the incident thereby breaching the editorial standards we require for all our reporting. It has therefore been deleted.

English

14.4K

2.5K

20.7K

8.9M

Domas Mituzas 🤡@mituzas·9 Mar

@vguleria19 @brankopetric00 Or use transactions or other query side effects?

English

Vikrant Guleria@vguleria19·9 Mar

@brankopetric00 usually because someone forgot an index or let a rogue query run without a timeout. pooling just makes the failure more organized.

English

177

Branko@brankopetric00·8 Mar

Database connection pooling is great until all the connections are held by hanging queries and new requests timeout. Then it's a bottleneck.

English

101

9.3K

Domas Mituzas 🤡@mituzas·9 Mar

@brankopetric00 This is why we let in 5000 connections per database instance in by default (and support tens of thousands, if needed). Pooling only works when it works.

English

Domas Mituzas 🤡@mituzas·9 Mar

@avrldotdev @arpit_bhayani How much does the drive head move on SSDs? But generally, in concurrent load drive head moves anyway, and merge chances are low..

English

avrl ☘@avrldotdev·9 Mar

@arpit_bhayani Interesting, it's related to disk fragmentation as disk fragmentation (B+ Tree page gaps) forces more random I/O. Disk fragmentation (physical file gaps) forces the drive head to move more. Both kill performance by turning a simple straight line search into a door 2 door search.

English

389

Arpit Bhayani@arpit_bhayani·9 Mar

Your MySQL index isn't slow; it's fragmented :) Databases like MySQL that hold data in B+ trees suffer from index fragmentation, and this severely impacts the performance; hear me out... Index fragmentation happens when B+ tree index pages contain significant amounts of free space instead of being densely packed with data. But why would that happen? MySQL's InnoDB stores data in clustered indexes (organized by primary key). When the clustered index fragments due to random primary key inserts (like UUIDs), data retrieval performance takes a hit. Because the engine will try to keep the leaves ordered, this leads to a page split to insert the row in the middle. The page splits when there isn't enough space in a page, or it exceeds the split threshold. Over time, repeated random inserts cause more splits, wasting much space. Index fragmentation directly impacts query performance and memory utilization. When indexes are fragmented, the database must read more pages from disk to execute the same query, increasing I/O operations and reducing throughput. By the way, fragmentation is minimal when insertions are sequential, as InnoDB simply creates new pages without splitting existing ones, and placing them along the rightmost path of the tree. This helps in maintaining optimal page density. You can manage index fragmentation by tuning `innodb_fill_factor` and/or firing the following query ```sql ALTER TABLE tbl_name FORCE; ``` Hope you found this interesting :)

English

162

14.1K

Domas Mituzas 🤡@mituzas·9 Mar

@arpit_bhayani InnoDB merges neighboring pages, so you won't really have extreme fragmentation causing what you're writing here about. Sure, your fill factor will be slightly lower on random inserts, but it will rarely be the biggest performance problem.

English

204

Domas Mituzas 🤡@mituzas·5 Mar

@nateberkopec Just tells you don’t have enough app regions!!!

English

445

Nate Berkopec@nateberkopec·5 Mar

DB in a different region than compute. Fixed! Second time I've seen this in my career. The clue: if DB query latency is a _minimum_ of 5+ms, even for `SELECT 1`, your database is in the wrong region.

English

180

28.1K

Domas Mituzas 🤡@mituzas·4 Mar

@ChShersh @Heliocene who brings Boyer-Moore to Commentz-Walter fight?

English

Dmitrii Kovanikov@ChShersh·4 Mar

@Heliocene Nobody brings Boyer-Moore algorithm 😒

English

4.2K

Dmitrii Kovanikov@ChShersh·4 Mar

Interview question: You have 1B+ text messages of size 1-500 symbols stored in an SQL database. How do you quickly search for all messages containing all words from a given list?

English

342

70.8K

Domas Mituzas 🤡@mituzas·4 Mar

@ChShersh Parallel Commentz-Walter at SSD-speed. Data is compressed, so lets say footprint is 200GB, this puts us at few minutes on a single node.

English

169

Domas Mituzas 🤡@mituzas·3 Mar

@jhleath I understand differences/strengths somewhat, I am confused by how database indexes are fundamentally different from "Cassandra-like O(1)" - missed something somewhere, or how this becomes orders-of-magnitude different perf.

English

Hunter Leath@jhleath·3 Mar

not debating any of those things! they're just a poor fit for file systems. let's say that you were going to use a database as the backing storage for a file system, and you were stuffing every inode (multi-tenant, so for all file systems) in the same table. the O(log) properties of the index will degrade (yes, slowly) as you continue to add files into the system. this has an easy fix, right? you could just shard the database. so then we could move the database to something like vitess and basically have data replicated across 3 instances or something like that. you still have log lookups, but now you can distribute those across a large number of servers. only problem? databases can't do erasure coding in order to do efficient querying. so you're replicating something like 3:1 which isn't what you want to do for bulk data, so then you split the system into a bulk-data layer and a metadata layer -- so that you can do EC on the data. the database sharding doesn't come for free and transactions that involve multiple inodes need to do some kind of conensus protocol in order to coordinate writes across multiple inodes, which starts to slow down the file system, so you explore whether you need to add some kind of journaling layer *on top* of the database, etc etc they work, of course, but it's much simpler from a systems perspective to just build the right tool for the job, which is a storage system that doesn't have transactional semantics or like a logical bottleneck

English

194

Hunter Leath@jhleath·3 Mar

we do not use a transactional database anywhere in the Archil file system. why not? isn't it simpler to build a file system on top of a database? the reason why is performance and scalability. relational databases are generally broken into units of "tables" which allow you to insert data across specific indexes, so a naive file system implementation might have two tables: one for file metadata (keyed by inode id) and one for file data (keyed by inode id and offset). the problem with this approach is that stuffing everything into a single index means that the performance of the system is going to degrade the more files and data you add to it! this is super undesirable for a system that should be designed to hold *vast amounts of data*. the second problem is that the performance requirements of a file system are often an order of magnitude higher than what you might need out of a database -- where a single "npm install" could issue tens of thousands of requests super quickly. we're able to solve both of these problems by building a system that looks a lot more like Cassandra (giving us O(1) lookups into any amount of data), coalescing file metadata and data on the same storage hosts, and writing a customized storage layer to optimize for both metadata and bulk storage writes. relational databases have a huge efficiency cost to give users the ability to execute transactions and make relational queries. in domains (like ours), where you don't need those kind of semantics, you can get MUCH better performance by avoiding them entirely.

English

5.5K

Domas Mituzas 🤡@mituzas·2 Mar

Interviewer: How do you… Me: sorry, I never had any real technical interviews in my life.

English

902

Domas Mituzas 🤡@mituzas·2 Mar

@Oblivious9021 Your request gets routed to an edge pop, sent over to nearest* origin dc to check against caches, and if it is a cache miss a query is sent to another datacenter that has the data. Yes, you can go to three different datacenters that quickly. * global load balancing applies

English

157

Shreya@Oblivious9021·2 Mar

Interviewer: How do apps like Google, Instagram, or Twitter instantly know a username is already taken the moment you type it?

English

211

210

13.3K

2.3M

Domas Mituzas 🤡@mituzas·1 Mar

Sometimes it is quite hard to run a good database benchmark. A good MyRocks production-like workload simply won't fit in Postgres.

English

492

Domas Mituzas 🤡@mituzas·28 Şub

@__karnati -O0 and strip debug symbols, lol

English

389

Sri@__karnati·28 Şub

Your Docker image size is 2GB Your app image is 2GB. Deployments are getting slow. CI takes forever to build. How do you reduce the size of the image and improve deployment speed?

English

144

24.1K

Keşfet

@samlambert @tszzl @jamesacowling @shrydar @neogoose_btw @brankopetric00 @CNN @vguleria19