prerak
653 posts

prerak
@prerak_011
Strolling through big tech and centralized infrastructures . On hiatus from decentraland $ADA , $ERG , $BTC






🚨 India is working on a UPI-style unique username-based digital addressing system that would enable people to send and receive parcels, letters, food deliveries, and other services without sharing a conventional physical address. 🤯 (ET)








🚨 Oracle is laying off a few employees and withdrawing many placement offers in India.




Autonomous AI agents are about to become more than just “chatbots.” They will buy data, rent compute, call APIs, sell outputs, and pay other agents. But regular payments are not enough for them. They need money with logic.






Okay so I wanna share something I’ve been building for the past few days. Built a Rust-based distributed crawler targeting 100M+ pages/day. Right now it’s still single-node, but the core architecture is finally coming together. Structured it as a modular Rust workspace with crates like: crawler-core, crawler-fetch, crawler-parse, crawler-storage, crawler-frontier, crawler-cli A lot of the work went into making the crawler actually production-grade instead of just “fetch pages in a loop”. Implemented: * URL normalization (utm stripping, host normalization, query sorting, canonicalization, registrable domain extraction) * Real robots.txt support with caching + longest-prefix matching * Domain-level politeness scheduling * SQLite-backed persistent frontier * Lease-based task recovery so worker crashes don’t lose crawl state * Retry system with exponential backoff * Async fetcher with compression, redirects, latency tracking, SHA-256 body hashing *HTML parser for title/canonical/outlinks extraction * Durable dedupe across restarts * Priority scheduling over crawl frontier * Crawl safety limits + static asset filtering * URLs now move through an actual lifecycle state machine: * queued → leased → fetched | blocked | failed * and expired leases can safely recover after crashes. The interesting part is that the crawler is slowly turning into a distributed systems problem: scheduling, fault tolerance, fairness, backpressure, leases, retries, adaptive politeness, durable state, content identity, etc. Current result: a restart-safe, polite, persistent crawler core that can crawl, parse, dedupe, retry, schedule, and recover leased work reliably. Have to engineer it more. Rust has honestly been insanely good for this kind of systems engineering.









