Andrea Soria Jimenez

218 posts

Andrea Soria Jimenez

Andrea Soria Jimenez

@andrejanysa

Software Engineer @huggingface 🤗

Bolivia Katılım Eylül 2013
755 Takip Edilen157 Takipçiler
Andrea Soria Jimenez retweetledi
merve
merve@mervenoyann·
Dataset Viewer for PDFs just landed on @huggingface 🤗 check all the document datasets on Hub🤝
merve tweet media
English
5
12
137
7.2K
Andrea Soria Jimenez
Andrea Soria Jimenez@andrejanysa·
📄 New on Hugging Face Hub: native PDF dataset support! You can now render PDFs directly in the Dataset Viewer — with thumbnails, in-browser previews, and full integration with datasets + pdfplumber. Perfect for document-based ML workflows → huggingface.co/blog/asoria/pd…
Andrea Soria Jimenez tweet mediaAndrea Soria Jimenez tweet mediaAndrea Soria Jimenez tweet media
English
1
3
6
840
Andrea Soria Jimenez
Andrea Soria Jimenez@andrejanysa·
🚀 Synthetic data is revolutionizing AI & ML! DataDreamer, an open-source Python library, makes generating synthetic data seamless & integrates effortlessly with @huggingface . Easily push datasets to the Hub and share them with the community 🔍 Learn how: #6790671e20a7d3ca6f72b6cb" target="_blank" rel="nofollow noopener">huggingface.co/blog/asoria/da…
Andrea Soria Jimenez tweet media
English
1
9
28
1.5K
Andrea Soria Jimenez retweetledi
Andrea Soria Jimenez retweetledi
Quentin Lhoest 🤗
Quentin Lhoest 🤗@lhoestq·
Hugging Face is now officially in the pandas Ecosystem page 🎉 Let me know what you'd like to see next for HF + pandas
Quentin Lhoest 🤗 tweet media
English
3
24
197
19.8K
Andrea Soria Jimenez
Andrea Soria Jimenez@andrejanysa·
Synthetic data generation has never been easier! 🎉 Generate structured output effortlessly with #fastdata and @huggingface 🚀 Steps: 1️⃣ Define your schema 📝 2️⃣ Add a generation prompt 💡 3️⃣ Input your data 🔄 4️⃣ Share it freely on Hugging Face 🌍
Andrea Soria Jimenez tweet mediaAndrea Soria Jimenez tweet media
English
3
15
123
6.3K
Andrea Soria Jimenez retweetledi
Quentin Lhoest 🤗
Quentin Lhoest 🤗@lhoestq·
Damn this is cool Semantic operations for pandas dataframes using open models from @huggingface. Brought to you by @lianapatel_ and the LOTUS team at Stanford and Berkeley Semantic search, Group by topic, Top K semantic sorting etc. with LLama 3.3 70B
Quentin Lhoest 🤗 tweet media
English
4
9
33
3.9K
Andrea Soria Jimenez retweetledi
Quentin Lhoest 🤗
Quentin Lhoest 🤗@lhoestq·
🤗 Datasets 3.2 is out ! With faster Parquet streaming (up to +100% speed) and faster filtering via predicate pushdown ⚡ Example: fast streaming of recent FineWeb-2 data from @huggingface
Quentin Lhoest 🤗 tweet media
English
2
11
88
5.4K
Andrea Soria Jimenez retweetledi
Quentin Lhoest 🤗
Quentin Lhoest 🤗@lhoestq·
Things are getting interesting 🤗✨👀
Quentin Lhoest 🤗 tweet media
English
0
1
4
209
Andrea Soria Jimenez
Andrea Soria Jimenez@andrejanysa·
@huggingface has released a new feature that makes interacting with datasets even easier. 🌟 Introducing the #Text2SQL feature for the SQL Console – now you can talk to your dataset like never before! 🗣️💻
English
0
0
1
70
Andrea Soria Jimenez retweetledi
Caleb
Caleb@calebfahlgren·
The amazing, new Qwen2.5-Coder 32B model can now write SQL for any @huggingface dataset ✨
English
9
38
192
30.1K
Andrea Soria Jimenez
Andrea Soria Jimenez@andrejanysa·
✨ How it works: 1️⃣ Define your output schema 📜 2️⃣ Craft your data generation prompt 🛠️ 3️⃣ Prepare your inputs 🎯 4️⃣ Generate and push to Hugging Face Hub directly 🚀
English
2
0
2
40
Andrea Soria Jimenez
Andrea Soria Jimenez@andrejanysa·
🚀 Fastdata (by @answerdotai) + @huggingface: Synthetic Data Made Simple! 🤖📊 Generate data for deep learning 📜🛠️🎯 and push it directly to Hugging Face Hub 🌐. With Incremental Uploads, fastdata handles large-scale projects effortlessly!
Andrea Soria Jimenez tweet media
English
1
5
17
1.4K
Andrea Soria Jimenez
Andrea Soria Jimenez@andrejanysa·
💡 Pro Tip: With Incremental Uploads, fastdata can automatically push updates to the Hub every N minutes, making it perfect for large-scale synthetic data projects.
English
0
0
2
36
Andrea Soria Jimenez retweetledi
Quentin Lhoest 🤗
Quentin Lhoest 🤗@lhoestq·
My new app is out !! ✨The Common Crawl Pipeline Creator ✨ Create your pipeline easily: ✔Run Text Extraction✂️ ✔Define Language Filters🌐 ✔Customize text quality💯 ✔See Live Results👀 ✔Get Python code 🐍 Based on famous LLM research like Gopher, C4 or FineWeb
English
5
24
105
15K
Andrea Soria Jimenez retweetledi
SomosNLP
SomosNLP@SomosNLP_·
🔥 Presentamos #LaLeadeboard, la primera leaderboard open-source para evaluar automáticamente #LLM en las variedades del español y lenguas oficiales de España y LATAM. huggingface.co/spaces/la-lead…
Español
6
75
239
53.7K