engineer Projects
28 posts

engineer Projects
@engineerPr94901
Passionate Big Data Engineer student and AI enthusiast | Python & SQL & R | Machine Learning & Statistics | Musician & Programmer | #BigData #AI
France Inscrit le Mart 2024
137 Abonnements6 Abonnés

@DataChaz Brilliant approach for data-heavy pages with charts and tables! However, I wonder about the resource consumption. Processing 30M+ screenshots for Wikipedia sounds incredibly heavy compared to pure text embeddings. Is this viable for small-scale production apps yet?
English

STOP PARSING HTML FOR RAG. JUST SCREENSHOT IT 🔥
Researchers from UC Berkeley just released PixelRAG, an open-source system that skips HTML parsing entirely.
Why is it changing web scraping for good?
Well, instead of scraping a page into text and embedding chunks:
#1 it screenshots that page and retrieves the image
#2 a vision-language model then reads the answer straight off the pixels
This solves a real problem: parsing is where RAG quietly loses context:
→ A single HTML-to-text parser can drop 40%+ of a page
→ Tables, charts, and complex layouts get flattened or deleted
→ Changing parsers alone can swing accuracy by 10 points
By indexing the page exactly as a human sees it, PixelRAG beats the strongest text RAG baseline by 18.1% on text-only QA.
To prove it scales, the team built a visual index of all 8.28M Wikipedia articles using 30M+ screenshots.
It even includes a Claude Code plugin (pixelbrowse) that gives Claude native "eyes" to read live pages locally, how cool is that?
100% free and open-source (Apache-2.0).
Here's the repo → github.com/StarTrail-org/…
BONUS: If you're building retrieval systems, my friend @akshay_pachaar recently wrote a great guide on slashing your RAG corpus by 40x and tokens by 3x.
Check it out below ↓

Akshay 🚀@akshay_pachaar
English

@PythonPr Why compare them? Each programming language has its own strengths.
English
engineer Projects retweeté
engineer Projects retweeté

@ElonMuskAOC It will really be nice to have one.
Looks beautiful ❤️
English
engineer Projects retweeté
engineer Projects retweeté
engineer Projects retweeté

Je viens d'effacer l'arrière-plan d'une photo en seulement 5 secondes avec picwish.com/fr/. Essayez maintenant l'éditeur de photos IA le plus simple! via @picwishcom
#PicWish #PicWishReferral
Français
engineer Projects retweeté
engineer Projects retweeté

Python Coding challenge - Day 166 | What is the output of the following Python Code?
Solution and Explanation: clcoding.com/2024/04/python…

English
engineer Projects retweeté
engineer Projects retweeté
engineer Projects retweeté
engineer Projects retweeté

Python Coding challenge - Day 165 | What is the output of the following Python Code?
Solution and Explanation: clcoding.com/2024/04/python…

English















