Sabitlenmiş Tweet
Nilesh 🐧
3.1K posts

Nilesh 🐧
@knileshh
Spec Driven Developer | Distributed Systems | AI Integration & Workflows
India Katılım Ekim 2017
239 Takip Edilen658 Takipçiler

@TreeApostle @avrldotdev I guess, we can have the simple rule of just peeling away a layer below, and then enjoying the abstraction provided by the lower levels, so we know enough and not too in depth that we leave the breadth on the table.
English
Nilesh 🐧 retweetledi
Nilesh 🐧 retweetledi

@Parinda_01 @SahilExec Then someone uses fixed window rate limiting and gets ddosed again. /s
English

@SahilExec Use rate limiter which limits the no of requests.
English

12/12
Production Rules and Takeaways
Based on these experiments, here are the non-negotiable rules for production retry logic.
1/ Always bound your retry count to a maximum of 2-3 attempts.
2/ Always implement exponential backoff starting around 50-100ms.
3/ Always add jitter to prevent synchronized retries.
4/ Set your timeout based on p99 latency plus a reasonable buffer, not an arbitrary small number.
5/ Monitor your retry rates in production because a sudden spike in retries is often the first sign of an impending cascading failure.
English

Think retries are the heroes in your microservices? Think again as they are often the villain turning minor glitches into total outages.
1/12
It is very common that retries in microservices are implemented incorrectly.
This is not a small mistake; it can have major repercussions such as causing complete system failure from a small downstream hiccup.
I built a small experiment that shows how naive retry logic creates cascading failures, and how to fix it with three simple patterns.
Here's what I learned by breaking things systematically.
A thread 🧵 ↓

English

@anirudhology Interesting, I was only using retries with Exponential Backoffs, This Jitter with randomness is a new idea in my toolkit. Thanks
English

8/12
Fix #1 - Exponential Backoff with Jitter
The most effective fix is combining exponential backoff with jitter.
In exponential backoff, we don't retry after fixed interval but with each retry the interval increases exponentially.
First retry after 50ms, second after 100ms, third after 200ms and so on...
The jitter adds +-50% randomness to these delays.
Result: only 272 total retries and queue depth of 7.
That's almost 53% reduction in retries compared to naive, and we are back to near-baseline queue depth.
English

The trap here: everyone says normalize without asking why the 2MB exists. I have watched this 3 times. Team A normalized, 30% faster reads, but now every query joins 7 tables. Six months later they are over-normalized. They never understood their actual access patterns. Team B kept it JSON with caching. 95% cache hits, zero issues in 2 years. The question is not Is 2MB bad. It is What are you querying and how often. Design the schema backwards from access patterns, not from what you think good schema looks like.
English

Ever wondered how your phone and your friend’s phone show the exact same time even in different places?
Behind the scenes, there’s a hero: Network Time Protocol (NTP).
It keeps billions of devices in sync, down to milliseconds.
Interesting article: @RocketMeUpNetworking/understanding-the-role-of-network-time-protocol-ntp-96225a9dd4a7" target="_blank" rel="nofollow noopener">medium.com/@RocketMeUpNet…
English

@KaiXCreator Depends on the file type, if you keep it normal its easy to read, so it takes less time to process. If you compress it, it will take time to decompress and process. Huffman Encoding
English

@SumitM_X views are just saved queries you treat like a table.
real use case: fintech app. you don't want every dev querying raw transactions table directly. create a view with only what's needed, hide sensitive columns, control access.
security + simplicity in one shot 🤷♂️
English

@SumitM_X Ohh man! I can think of many use cases I have seen in my team
-> customer will see all cols without a view, later if upstream team deprecating a column, you will end dropping this column and get it from a different meta data table, better if u have view later you can join tables
English

@TreeApostle @javarevisited Yeah, True. Also, personally if there's option to preprocessing 90% of the gains can be taken from it. But, conditions apply the data shouldn't be actively updated.
English

Id say I know for HDD it was ~100 ms for disk seeks whereas for ssds ~10 ms or so. I know nvme ssds are even faster but not sure of the exact math.
When we are not considering mechanical drives, then disk seeks become less expensive, but end of the day they still will remain the significant part of any process and as such all optimizations would still start there in my opinion.
English

@BenjDicken @NarutoUzmaki201 I hope you mentioned the threshold before we start to shard. Most people start the conversation of scaling with the sharing. Vertical scaling cna take you really long way.
English

@TreeApostle @javarevisited Things would change when we're not considering a mechanical drive right? What's will happen with modern hardware i wonder? Most servers now use nvme ssds
English

Searching for a sorted file of 50GB that cant fully be loaded into memory is a kind of system design question.
The way to ans this is to go sequentially, even if you think you know the ans, ask clarifying ques. Will there be multiple searches or single search? What is the RAM available? Is the file on disk(HDD/SSD) or in remote S3 or blob storage? Is preprocessing allowed?
After we establish the problem state, then we are able to think of solutions in an increasing scale of complexity.
Binary search would be the starting point since the file is sorted. It would be O(logN). But one disadvantage is the disk seeks which can be expensive.
If we are allowed to preprocess, then establish a sparse index. An in memory mapping of Nth key/nodes. For a few mb used up, we can reduce the disk seeks drastically.
If there are expected to be multiple misses, then a bloom filter would be useful as a prefilter.
English














