Domas Mituzas 🤡

1.6K posts

Domas Mituzas 🤡

Domas Mituzas 🤡

@mituzas

small data artisan. distinguished weaponized adhd at meta, survived hypergrowth at facebook(2009+)/wikipedia(2004-2011), dining fakefluencer on ig domas.eats

Menlo Park, CA Katılım Ağustos 2009
192 Takip Edilen1.7K Takipçiler
Domas Mituzas 🤡
Domas Mituzas 🤡@mituzas·
@samlambert Thats about 1TB more memory for that SSD capacity than I’m used to. And I thought you were all about IOPS.
English
1
0
1
1.1K
Sam Lambert
Sam Lambert@samlambert·
we have a new Metal SKU (yes people use them)
Sam Lambert tweet media
English
43
3
393
62.5K
roon
roon@tszzl·
(hail mary spoilers so please skip if you care) isn’t the situation by the end that every single star in galaxy dies (or dims enough to kill all planetary life) except the two that are saved and the third where the taomeba originate
English
60
3
260
54.5K
Domas Mituzas 🤡
Domas Mituzas 🤡@mituzas·
Of course AI uses lots of sea water - you need to extract deuterium for fusion reactors, etc - the reason to colonize other planets that have oceans on them.
English
0
0
2
131
James Cowling
James Cowling@jamesacowling·
If you don't hate databases you don't love databases
English
8
4
125
5.1K
Christopher (shrydar.bsky.social)
@neogoose_btw Wait, is there a way to persist sessions that survives laptop sleeps and VPN timeouts that *doesn't* involve tmux? (I spent enough hours attempting to get the latter to behave itself that I've drawn a line under that one, sorry tmux fans)
English
4
0
3
2.8K
Dmitriy Kovalenko
Dmitriy Kovalenko@neogoose_btw·
I can’t believe some people seriously running ssh for working full day without keep-alive wrapper and loosing connection every time you go to the toilet And some of you using this as a reason to ruin your terminal experience by using tmux 😭
English
30
1
142
27.1K
Domas Mituzas 🤡
Domas Mituzas 🤡@mituzas·
@brankopetric00 Cron jobs should run in the middle of the work day, so that engineers can respond to issues, if needed.
English
0
1
0
163
Branko
Branko@brankopetric00·
Cron job was supposed to run at 2 AM. Server had wrong timezone. It ran at 8 AM instead. During peak traffic. It processed 6 million records.
English
38
24
1.6K
85.6K
Domas Mituzas 🤡
Domas Mituzas 🤡@mituzas·
@CNN It was supposed to be a lighthearted post, but it took a dark turn.
English
0
0
0
45
CNN
CNN@CNN·
A post regarding the two individuals arrested for throwing homemade bombs outside of New York City Mayor Zohran Mamdani’s home failed to reflect the gravity of the incident thereby breaching the editorial standards we require for all our reporting. It has therefore been deleted.
English
14.4K
2.5K
20.7K
8.9M
Vikrant Guleria
Vikrant Guleria@vguleria19·
@brankopetric00 usually because someone forgot an index or let a rogue query run without a timeout. pooling just makes the failure more organized.
English
1
0
3
177
Branko
Branko@brankopetric00·
Database connection pooling is great until all the connections are held by hanging queries and new requests timeout. Then it's a bottleneck.
English
9
3
101
9.3K
Domas Mituzas 🤡
Domas Mituzas 🤡@mituzas·
@brankopetric00 This is why we let in 5000 connections per database instance in by default (and support tens of thousands, if needed). Pooling only works when it works.
English
0
0
1
92
avrl ☘
avrl ☘@avrldotdev·
@arpit_bhayani Interesting, it's related to disk fragmentation as disk fragmentation (B+ Tree page gaps) forces more random I/O. Disk fragmentation (physical file gaps) forces the drive head to move more. Both kill performance by turning a simple straight line search into a door 2 door search.
English
1
0
3
389
Arpit Bhayani
Arpit Bhayani@arpit_bhayani·
Your MySQL index isn't slow; it's fragmented :) Databases like MySQL that hold data in B+ trees suffer from index fragmentation, and this severely impacts the performance; hear me out... Index fragmentation happens when B+ tree index pages contain significant amounts of free space instead of being densely packed with data. But why would that happen? MySQL's InnoDB stores data in clustered indexes (organized by primary key). When the clustered index fragments due to random primary key inserts (like UUIDs), data retrieval performance takes a hit. Because the engine will try to keep the leaves ordered, this leads to a page split to insert the row in the middle. The page splits when there isn't enough space in a page, or it exceeds the split threshold. Over time, repeated random inserts cause more splits, wasting much space. Index fragmentation directly impacts query performance and memory utilization. When indexes are fragmented, the database must read more pages from disk to execute the same query, increasing I/O operations and reducing throughput. By the way, fragmentation is minimal when insertions are sequential, as InnoDB simply creates new pages without splitting existing ones, and placing them along the rightmost path of the tree. This helps in maintaining optimal page density. You can manage index fragmentation by tuning `innodb_fill_factor` and/or firing the following query ```sql ALTER TABLE tbl_name FORCE; ``` Hope you found this interesting :)
English
8
6
162
14.1K
Domas Mituzas 🤡
Domas Mituzas 🤡@mituzas·
@arpit_bhayani InnoDB merges neighboring pages, so you won't really have extreme fragmentation causing what you're writing here about. Sure, your fill factor will be slightly lower on random inserts, but it will rarely be the biggest performance problem.
English
0
0
1
204
Nate Berkopec
Nate Berkopec@nateberkopec·
DB in a different region than compute. Fixed! Second time I've seen this in my career. The clue: if DB query latency is a _minimum_ of 5+ms, even for `SELECT 1`, your database is in the wrong region.
Nate Berkopec tweet media
English
16
7
180
28.1K
Dmitrii Kovanikov
Dmitrii Kovanikov@ChShersh·
Interview question: You have 1B+ text messages of size 1-500 symbols stored in an SQL database. How do you quickly search for all messages containing all words from a given list?
English
97
12
342
70.8K
Domas Mituzas 🤡
Domas Mituzas 🤡@mituzas·
@ChShersh Parallel Commentz-Walter at SSD-speed. Data is compressed, so lets say footprint is 200GB, this puts us at few minutes on a single node.
English
0
0
0
169
Domas Mituzas 🤡
Domas Mituzas 🤡@mituzas·
@jhleath I understand differences/strengths somewhat, I am confused by how database indexes are fundamentally different from "Cassandra-like O(1)" - missed something somewhere, or how this becomes orders-of-magnitude different perf.
English
0
0
0
33
Hunter Leath
Hunter Leath@jhleath·
not debating any of those things! they're just a poor fit for file systems. let's say that you were going to use a database as the backing storage for a file system, and you were stuffing every inode (multi-tenant, so for all file systems) in the same table. the O(log) properties of the index will degrade (yes, slowly) as you continue to add files into the system. this has an easy fix, right? you could just shard the database. so then we could move the database to something like vitess and basically have data replicated across 3 instances or something like that. you still have log lookups, but now you can distribute those across a large number of servers. only problem? databases can't do erasure coding in order to do efficient querying. so you're replicating something like 3:1 which isn't what you want to do for bulk data, so then you split the system into a bulk-data layer and a metadata layer -- so that you can do EC on the data. the database sharding doesn't come for free and transactions that involve multiple inodes need to do some kind of conensus protocol in order to coordinate writes across multiple inodes, which starts to slow down the file system, so you explore whether you need to add some kind of journaling layer *on top* of the database, etc etc they work, of course, but it's much simpler from a systems perspective to just build the right tool for the job, which is a storage system that doesn't have transactional semantics or like a logical bottleneck
English
2
0
0
194
Hunter Leath
Hunter Leath@jhleath·
we do not use a transactional database anywhere in the Archil file system. why not? isn't it simpler to build a file system on top of a database? the reason why is performance and scalability. relational databases are generally broken into units of "tables" which allow you to insert data across specific indexes, so a naive file system implementation might have two tables: one for file metadata (keyed by inode id) and one for file data (keyed by inode id and offset). the problem with this approach is that stuffing everything into a single index means that the performance of the system is going to degrade the more files and data you add to it! this is super undesirable for a system that should be designed to hold *vast amounts of data*. the second problem is that the performance requirements of a file system are often an order of magnitude higher than what you might need out of a database -- where a single "npm install" could issue tens of thousands of requests super quickly. we're able to solve both of these problems by building a system that looks a lot more like Cassandra (giving us O(1) lookups into any amount of data), coalescing file metadata and data on the same storage hosts, and writing a customized storage layer to optimize for both metadata and bulk storage writes. relational databases have a huge efficiency cost to give users the ability to execute transactions and make relational queries. in domains (like ours), where you don't need those kind of semantics, you can get MUCH better performance by avoiding them entirely.
Hunter Leath tweet media
English
5
2
86
5.5K
Domas Mituzas 🤡
Domas Mituzas 🤡@mituzas·
Interviewer: How do you… Me: sorry, I never had any real technical interviews in my life.
English
1
0
13
902
Domas Mituzas 🤡
Domas Mituzas 🤡@mituzas·
@Oblivious9021 Your request gets routed to an edge pop, sent over to nearest* origin dc to check against caches, and if it is a cache miss a query is sent to another datacenter that has the data. Yes, you can go to three different datacenters that quickly. * global load balancing applies
English
0
0
0
157
Shreya
Shreya@Oblivious9021·
Interviewer: How do apps like Google, Instagram, or Twitter instantly know a username is already taken the moment you type it?
English
211
210
13.3K
2.3M
Domas Mituzas 🤡
Domas Mituzas 🤡@mituzas·
Sometimes it is quite hard to run a good database benchmark. A good MyRocks production-like workload simply won't fit in Postgres.
English
1
0
9
492
Sri
Sri@__karnati·
Your Docker image size is 2GB Your app image is 2GB. Deployments are getting slow. CI takes forever to build. How do you reduce the size of the image and improve deployment speed?
English
25
6
144
24.1K