Heshan Sanjuka

381 posts

Heshan Sanjuka banner
Heshan Sanjuka

Heshan Sanjuka

@hexsyro

Web Scraping & Automation Engineer | Building Scalable Data Pipelines I share what I learn.

Katılım Kasım 2024
88 Takip Edilen11 Takipçiler
Sabitlenmiş Tweet
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
100+ premium datasets across Reddit, YouTube, GitHub & Medium. Reddit threads, GitHub repos, YouTube videos & Medium articles — pre-processed with sentiment scores, topic tags, and engagement signals. • Daily pipelines (Playwright + anti-detection) • S3 streaming downloads • CSV / JSON / JSONL / Parquet • REST API (no SDK needed) Free samples (no signup) + free datasets under 500 records. Don’t see what you need? We can collect, clean, and deliver custom datasets in days. → socialintel.io
English
7
0
2
72
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
PulseAggregator scaled: From 250+ sources to 1,000+ sources in 2 months! BBC → Reuters → Guardian → TechCrunch → 975+ more. 4-tier RSS + Playwright fallback now handles 100K+ articles/week with 99.9% uptime. Volume → Value. #WebScraping #ETL #BuildInPublic
English
0
0
1
18
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
Social Intel update: Went from 4 platforms to 79 platforms! Reddit → YouTube → GitHub → Medium → 75+ more. Production ETL now delivers sentiment, emotions, financial signals, virality scores across all. Scale compounds. #WebScraping #DataEngineering #BuildInPublic
English
0
0
2
22
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
Building Social Intel - production ETL pipeline scraping 75+ social platforms → clean datasets with sentiment, emotions, financial signals, virality scores. Drop-in ready for AI training & analytics. #WebScraping #DataEngineering #Python
English
0
0
1
13
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
PulseAggregator live: 1,000+ news sources → deduplicated articles → full-text search + keyword alerts. FastAPI + PostgreSQL + Next.js. 100K+ articles/week, 99.9% uptime. Media monitoring that actually works. #ETL #FastAPI #Playwright
English
0
0
1
9
Heshan Sanjuka retweetledi
no context memes
no context memes@nocontextmemes·
ZXX
203
10.5K
145.7K
2.4M
Heshan Sanjuka retweetledi
Kunal
Kunal@kunal_twts·
I wonder how hard it was before API came to be 😂
English
39
111
1.1K
90.4K
Heshan Sanjuka retweetledi
Amanpreet Singh
Amanpreet Singh@apsdehal·
This is bigger than it seems for the AI agents. S3 Files lets you mount any S3 bucket as a native NFS on any container or lambda with ~1ms latency via EFS under the hood. Why it matters for agents: no more copying data or bridging object <-> file abstractions. Agents can now read/write S3 directly as a mounted filesystem. Multiple agents can share the same mount with close-to-open consistency. Long-term storage becomes the same as the short-term storage. Agent runtime bootstrap and teardown become trivial and instant while your data stays durable in S3 with auto bi-directional sync.
Amazon Web Services@awscloud

Announcing Amazon S3 Files. The first and only cloud object store with fully-featured, high-performance file system access. Learn more here. go.aws/4tw17Zg

English
21
50
923
258.7K
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
free custom datasets for pro users
English
0
0
0
5
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
100+ premium datasets across Reddit, YouTube, GitHub & Medium. Reddit threads, GitHub repos, YouTube videos & Medium articles — pre-processed with sentiment scores, topic tags, and engagement signals. • Daily pipelines (Playwright + anti-detection) • S3 streaming downloads • CSV / JSON / JSONL / Parquet • REST API (no SDK needed) Free samples (no signup) + free datasets under 500 records. Don’t see what you need? We can collect, clean, and deliver custom datasets in days. → socialintel.io
English
7
0
2
72
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
@adityadotdev I always run a bash script called commit.sh git add . git commit -m "new update" git pull --rebase origin main git push origin main --force
English
0
0
2
102
Aditya
Aditya@adityadotdev·
Everytime I do a git commit
Aditya tweet media
English
20
30
449
11.1K
Heshan Sanjuka retweetledi
DROID
DROID@droidbuilds·
how did github build github without github?
DROID tweet media
English
137
279
4.8K
246.2K
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
If you’re working with data, AI, or research: This can save you hours of manual work. → socialintel.io
English
0
0
1
15
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
Need a dataset that isn’t listed? I can collect, clean, and deliver custom datasets in a few days.
English
0
0
1
12
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
No SDK needed. Just: curl /api/datasets/sample/... Or plug it directly into Python, Node, or any HTTP client.
English
0
0
1
12
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
Under the hood: • Daily automated pipelines • Anti-detection scraping stack • S3-backed streaming • Schema-validated & deduplicated Production-ready data.
English
0
0
1
12
Heshan Sanjuka
Heshan Sanjuka@hexsyro·
People are using this for: • Market & sentiment analysis • LLM training datasets • Trend detection • OSINT research • Dashboards & data apps Built for real-world use.
English
0
0
1
16