
DesignToBuild
571 posts

DesignToBuild
@infiniteloop44
Software Engineer @Flipkart || System Design || DSA || Agentic AI || Consistency is key; you cannot be motivated daily.
Entrou em Kasım 2025
128 Seguindo1.5K Seguidores
Tweet fixado

> Because using an index doesn’t automatically mean the query is cheap.
> Here’s what’s really happening.
1. Low selectivity kills index benefits
> If a large percentage of rows have status = 'COMPLETED' (say 40–70%):
- The index does help find matching row IDs
- But it still has to return millions of matches
> At that point, an index scan is barely better than a full scan.
> Indexes shine when they eliminate most rows, not when they match half the table.
2. SELECT * forces heap lookups
> Your index is on status only.
> So the DB does:
- Scan the index to find matching row IDs
- For each match, fetch the full row from the table (heap)
> That means:
- Millions of random I/O reads
- Cache misses
- High latency
> The index helps filtering, but fetching the rows dominates the cost.
3. Index scan ≠ index-only scan
> EXPLAIN says Index Scan, not Index Only Scan.
> Why that matters:
- Index-only scans can avoid touching the table
- But SELECT * requires reading the heap
- So the DB can’t stay inside the index
4. The data may not be cache-resident
> If the working set doesn’t fit in memory:
> Every heap fetch hits disk
> Even SSDs will struggle at this scale
> 3–4 seconds is very believable
5. The index helps the planner, not the physics
> The planner chose the index because it’s slightly better than a full scan.
> But physics still applies: you’re reading a huge amount of data.
> What actually fixes it
- Don’t SELECT *, fetch only needed columns
- Use a covering index (include needed columns)
- Filter on more selective predicates
- Partition by status or time
- Pre-aggregate or move “COMPLETED” orders to cold storage
Takeaway:
- Indexes are about reducing work.
- If your predicate matches millions of rows, the work is still massive.
> An index can be used and the query can still be slow, and this is exactly that case.
> Index scans are fast when they return few rows.
When they return many, they can be slower than a full table scan.
English
DesignToBuild retweetou

Short Answer:
1. It’s synced backend state, not just local storage
2. Local storage can’t work across devices
3. Netflix stores progress per profile on the server
4. Client sends periodic heartbeat updates while watching
5. Backend handles massive write volume (Cassandra/DynamoDB)
6. Latest valid progress usually wins
7. Short accidental plays are ignored
8. Offline progress syncs when back online
9. Local storage is only a cache / UX optimization
10. Source of truth = server, not the device
English

• Small, static data → Offset.
Don't over-engineer. If the data isn't moving, LIMIT and OFFSET are quick to build and work perfectly fine.
• Large, growing data → Keyset (Cursor).
Offsets get slower the deeper you go. Cursors keep queries fast by jumping straight to a specific record.
• Real-time feeds → Cursor.
Offset pagination causes "item drift." If a new post drops while you're scrolling, you'll see the same content twice. Cursors fix this.
• Search results → Offset.
This is the one place users actually want page numbers. Search engines (like Elasticsearch) are optimized to handle these jumps.
• Infinite scroll → Cursor.
The UX is seamless. Just pass a next_token to the frontend and keep the data flowing without tracking page math.
• Sharded DBs → Keyset.
Offsets are a distributed systems nightmare. Cursors allow you to query across shards efficiently without massive data overhead.
English

Both queries are different.
1. WHERE salary = NULL
• The Result: Returns nothing (Empty set).
• The Logic: In SQL, any comparison to NULL using a standard operator (=,>,<) results in a state called UNKNOWN.
• The "Why": Since NULL represents an unknown value, the database logic is: "I don't know what the salary is, and I don't know what this other thing is, so I can't say they are equal." Even NULL = NULL is false (or rather, unknown) in SQL.
2. WHERE salary IS NULL
• The Result: Returns all rows where the salary column is empty/null.
• The Logic: This uses a specialized comparison operator designed specifically to check for the presence of the NULL state.
• The "Why": This is the proper syntax to ask the database: "Does this cell have a value or not?
English

A preflight request is a safety check done by the browser before making certain cross-origin requests.
In simple words:
The browser asks the server, “Am I allowed to make this request?”
How:
1. Your frontend wants to call an API on a different origin
2. The request is not simple (e.g. uses PUT, DELETE, custom headers, or JSON)
3. The browser first sends an OPTIONS request (this is the preflight)
4. The server replies with CORS headers saying what’s allowed
5. If allowed → browser sends the real request
If not → browser blocks it
Why it exists:
1. Prevents malicious websites from abusing APIs
2. Enforces CORS security rules
3. Protects users without breaking API
English
DesignToBuild retweetou

> SSL (TLS) Handshake is the process where your browser and a server introduce themselves, verify trust, and agree on how to talk securely.
> In simple words:
It’s the “security setup” that happens before any real data is sent.
> Flow:
1. Client says hello
- “I want to connect securely”
- Shares supported encryption methods
2. Server responds
- Sends its SSL certificate
- Says which encryption it chose
3. Client verifies the server
- Checks the certificate with trusted authorities
- Confirms: “Yes, this is really the server I wanted”
4. Secret key is created
- Client and server agree on a shared secret
- This key is known only to them
5. Secure communication starts
- All data is now encrypted
- Outsiders can’t read or modify it
Why it matters:
- Prevents eavesdropping
- Prevents man-in-the-middle attacks
- Ensures you’re talking to the right server
> One-line summary:
SSL Handshake establishes trust and encryption before data exchange.
English

DesignToBuild retweetou

Detailed Answer:
It’s synced state, not just local storage.
1. Local storage alone is insufficient
- Local storage exists only on one device. If Netflix relied on it, you couldn’t pause on your TV and resume on your phone. Cross-device resume requires a backend source of truth.
2. Playback progress is a backend concept
Netflix tracks viewing progress per profile, not per device. The backend stores:
- profile ID
- content ID
- last watched timestamp
- completion state
This is what enables seamless resume anywhere.
3. Continuous syncing via heartbeats
- The client doesn’t just update progress on pause. While you watch, the player sends periodic heartbeat events (every few seconds, on pause, exit, or app close). This avoids losing progress if the app crashes or the device dies.
4. Write-heavy backend design
- Millions of users constantly updating progress creates massive write load. Netflix uses write-optimized distributed stores (e.g., Cassandra/Dynamo-style systems) designed to absorb frequent updates and converge on the latest state.
5. Offline support
When offline:
- Progress is stored locally
- On reconnect, the client syncs and merges with
backend state
6. Why local storage still matters
Local state improves UX:
- Instant resume on the same device
- Offline playback
- Smoother seeking
But it’s never the source of truth.
Takeaway:
Local storage = optimization
Backend state = truth
English
DesignToBuild retweetou

- This is a classic OLTP vs OLAP mismatch.
- The database is optimized for OLTP (transactions).
The requirement is OLAP (analytics).
- Trying to make a transactional database do both at scale is why performance suffers.
- Here are 3 ways to fix it, ranked by complexity:
1. Quick fix: Materialized Views
If 1 - 5 min staleness is acceptable:
- Precompute expensive joins
- Store results as a materialized view
- Refresh periodically (concurrently)
- Dashboards query a flat table → fast
2. Real-time fix: Aggregate on write (Redis)
If “real-time” means sub-second:
- Don’t count rows, count events
- Update counters on every write
- Dashboard reads Redis → O(1)
- Reconcile periodically with DB (counters drift)
3. Scalable fix: CDC + Analytics DB
If slicing/filtering is needed at scale:
- Stream DB changes via CDC (e.g. Debezium)
- Load into ClickHouse / Elasticsearch / Snowflake
- Run analytics there, not on the primary DB
Summary:
- MVP / low traffic → Materialized Views
- Strict real-time → Redis counters
- Complex analytics → CDC + OLAP store
Normalize for writes.
Denormalize for reads
English
DesignToBuild retweetou

It’s not magic. It’s a pipeline
1. Detection (The Locator)
First, it solves "is there a face?" It uses lightweight models (like MTCNN or BlazeFace) to scan the image.
• Input: Full raw image.
• Output: Bounding box coordinates around the face.
2. Alignment (The Standardizer)
Neural networks hate variation. If your head is tilted, the math changes. The system uses landmarks (eyes, nose) to rotate and crop the image.
• Goal: Ensure eyes are always at the same pixel coordinates.
• Result: A normalized, frontal-facing input.
3. Embedding (The Translation)
This is the core logic (FaceNet). The image passes through a Deep CNN that compresses the face into a 128-dimensional vector (an array of numbers).
• The logic: It maps visual features to numerical coordinates.
• Key concept: It uses "Triplet Loss" to ensure your face always yields similar numbers, regardless of age or lighting.
4. Clustering (The Grouping)
The system doesn't "match" Photo A to Photo B directly. It plots all your photos in a vector space.
• It looks for dense clouds of points (vectors) that are close together.
• Euclidean Distance: Small distance = Same person. Large distance = Different person.
To the AI, you aren't a face. You are just a cluster of points in a 128-D vector space.
English

• Tables are Sets: In relational theory, a table is an unordered set; there is no "default" order.
• No ORDER BY: Without this clause, the database engine returns rows in the most efficient way it finds.
• Physical Storage: Rows are stored in data pages; if a row is updated or deleted, its physical position on disk may change.
• Full Table Scans: A sequential scan starts at the first page, but the "first" page can change after maintenance or vacuuming.
• Multithreading: Modern DBs use parallel workers; whichever thread fetches its segment first determines the initial rows.
• Buffer Cache: If some rows are already in memory, the DB might serve those first to fulfill the LIMIT quickly.
• Index Selection: The optimizer might use a different index today than it did yesterday based on updated statistics.
• Insert Fragmentation: New data isn't always appended to the end; it often fills gaps left by deleted records.
• Database Type: Different engines (Postgres vs. MySQL vs. Oracle) have different background processes that shuffle row visibility.
• Deterministic Results: To guarantee the same order every time, you must explicitly use ORDER BY on a unique column
English

Instagram’s Story ranking prioritizes Relationship Scoring over recency to maximize your time spent on the app.
Here’s the logic breakdown:
• Affinity > Recency: Relationship scoring is the heaviest weight. A "Best Friend" post from 10 hours ago will almost always beat a stranger's post from 1 minute ago.
• The DM Signal: Direct Messages are the strongest indicator of a real world connection. High DM frequency = permanent front-row seat.
• Completion Rates: The model tracks if you watch a creator’s full Story or swipe "Next." High completion % boosts their rank in your tray.
• Intentionality: If you search for a profile specifically, the algorithm interprets this as high intent and moves them up.
• Engagement Loops: Interacting with polls, sliders, or "Likes" on a Story acts as a massive feedback signal to the personalization engine.
• Close Friends Boost: This is a manual override. Adding someone to this list provides a static multiplier to their proximity score.
• The Goal: It’s an ensemble model predicting watch time. The goal is to minimize "churn" (leaving the app) by showing you the most relevant content first.
Instagram doesn't care when it was posted; it cares who posted it and how much you care about them.
English

The private key is never sent during the SSL handshake.
The server sends its public certificate (public key only)
Flow:
- The private key stays on the server, always
- The client verifies the certificate
- A shared session key is established using crypto (RSA)
- Only that session key is used for encryption
English

@infiniteloop44 @SumitM_X What if the private key that is sent over, gets intercepted?
English



