Andrea Soria Jimenez retweetet

Dataset Viewer for PDFs just landed on @huggingface 🤗
check all the document datasets on Hub🤝

English
Andrea Soria Jimenez
218 posts

@andrejanysa
Software Engineer @huggingface 🤗











We are happy to announce Curator, an open-source library designed to streamline synthetic data generation! High-quality synthetic data generation is essential in training and evaluating LLMs/agents/RAG pipelines these days, but tooling around this is still entirely lacking! So we built Curator, and I think it has easily 10x'ed our productivity in creating post-training datasets! We are cooking a lot of good features and examples to be released in the future, so you don't want to miss this! 🧵More info below!













