

Great annotation work from @ellamindAI / OpenEuroLLM on French-Science-Commons less than 24 hours after release!
Max Idahl
44 posts



Great annotation work from @ellamindAI / OpenEuroLLM on French-Science-Commons less than 24 hours after release!




And new data release: French-Science-Commons, the largest scientific corpus in French in open access including 1.25 million documents/42 million pages re-digitized with VLM (dots ocr).


Breaking: @pleiasfr and @nvidia release the first open synthetic dataset for personas in Europe: Nemotron-Personas-France. 1M synthetic French persons, with rich imaginary lives grounded on (complex) demographic distribution.








Time to propel open LLM training data curation to the next level. Releasing propella-1: small multilingual LLMs that annotate text documents for dataset curation at scale. 🧵👇

On December 8, the Perseverance rover safely trundled across the surface of Mars. This was the first AI-planned drive on another planet. And it was planned by Claude.
