Javier de la Rosa @[email protected]

16.4K posts

Javier de la Rosa @versae@mastodon.social banner
Javier de la Rosa @versae@mastodon.social

Javier de la Rosa @[email protected]

@versae

Research Scientist (NLP) at @Nasjonalbibl AI-Lab. Formerly, @UNED, @stanfordCIDR, @CulturePlex. «sin peripecias de relieve»

Madrid, Spain 가입일 Nisan 2007
946 팔로잉1K 팔로워
Javier de la Rosa @[email protected] 리트윗함
Siva Reddy
Siva Reddy@sivareddyg·
McGill University (@mcgillu) has many open faculty and postdoctoral positions with generous funding packages, thanks to Impact+ grants, which are investing $2 billion to attract global talent to Canada 🇨🇦🇨🇦🇨🇦. Associate/Full Professor: $8 million startup package Assistant Professor: $600K startup package Postdoc: $70K (starting salary) If you are interested and work in the space of AI/ML/NLP/LLMs, please reach out to me. #AI #NLProc #ML
Siva Reddy tweet mediaSiva Reddy tweet media
English
45
297
1.4K
194.9K
Javier de la Rosa @[email protected] 리트윗함
Mistral AI
Mistral AI@MistralAI·
Full stack devs, SWEs, MLEs, forward deployed engineers, research engineers, applied scientists: we are hiring! Join us and tackle cutting-edge challenges including physical AI, time series, material sciences, cybersecurity and many more. Positions available in Paris, London, Singapore, Amsterdam, NYC, SF, or remote. jobs.lever.co/mistral
English
89
100
1.2K
154.4K
Daniel van Strien
Daniel van Strien@vanstriendaniel·
DeepSeek-OCR just got @vllm_project support 🚀 Currently processing @natlibscot's 27,915-page handbook collection with one command: Processing at ~350 images/sec on A100 Using @huggingface Jobs + @astral_sh uv - zero setup batch OCR! Will share final time + cost when done!
Daniel van Strien tweet media
English
16
41
443
58.1K
Daniel van Strien
Daniel van Strien@vanstriendaniel·
465 people. 122 languages. 58,185 annotations! FineWeb-C v1 is complete! Communities worldwide have built their own educational quality datasets, proving that we don't need to wait for big tech to support languages. Huge thanks to all who contributed! huggingface.co/blog/davanstri…
English
4
28
107
20.1K
Javier de la Rosa @[email protected] 리트윗함
NLP_SINAI
NLP_SINAI@NLP_SINAI·
¿Te gustaría formar parte del equipo humano que desarrollará los próximos LLMs en español? Estamos buscando ingenieros en informática deseosos de involucrarse en un proyecto ilusionante y transformador. Más info. aquí: linkedin.com/posts/nlp-sina…
Español
0
9
6
574
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
"wow 0.06% per book, so with just 1667 books we should get 100%!" You're either: (a) poor at stats (b) never ran experiments (c) intentionally obtuse/just memeing. I'll give you the benefit of the doubt and assume it's (c). Think about it: what experiment needs to be conducted to come to such number? You need to train the same model twice, with only a single book removed as difference. But a single pair of runs doesn't mean much. Do the same pair of runs with a different init seed, or different data ordering seed, or different dataset mix, or... and you will most likely get a difference >0.06% for each run. Just look at these two figures below from "ResNet Strikes Back" showing 100 identical ResNet ImageNet trainings only changing seeds. 0.5% score range in one metric and 1.0% score range in another metric. You would need hundreds of runs with and hundreds of runs without one book to be able to reliably measure that book's impact (below the base "noise" level) while removing the other sources of variation in results. That would be very interesting, but also crazy expensive. And the result would differ per book. And differ per model scale. And differ per training duration. And differ per data mixture. And differ per eval looked at. So even ONE such (crazy expensive) experiment wouldn't mean much in general. So what they are saying is, a single book's influence is below the noise level. But again, even this would depend a lot on the setting. If the eval was "how good is model at niche topic X" and there's only two existing write-ups of topic X one of which being the book, the impact would probably be more than 0.06%. Btw, this is mostly a comment on people's reaction to their statement, not on their statement itself.
Lucas Beyer (bl16) tweet mediaLucas Beyer (bl16) tweet media
Andrew Curran@AndrewCurran_

Interesting legal argument from META; the use of a single book for pretraining boosts model performance by 'less than 0.06%.' Therefore, taken individually, a work has no economic value as training data.

English
18
14
301
47.4K
Michael Hu
Michael Hu@michahu8·
@versae that's awesome, love that. please reach out here or @ my nyu email with what you find!!
English
1
0
1
106
Michael Hu
Michael Hu@michahu8·
Training on a little 🤏 formal language BEFORE natural language can make pretraining more efficient! How and why does this work? The answer lies…Between Circuits and Chomsky. 🧵1/6👇
Michael Hu tweet media
English
23
125
929
132.6K
Javier de la Rosa @versae@mastodon.social
@michahu8 Awesome! I didn't get to the appendices yet 😅 Thanks for pointing that out. I'll be testing it on languages other than English, including extremely low-resource.
English
1
0
1
19
Javier de la Rosa @[email protected] 리트윗함
Hanna Hajishirzi
Hanna Hajishirzi@HannaHajishirzi·
Excited to drive innovation and push the boundaries of open, scientific AI research & development! 🚀 Join us at @allen_ai to shape the future of OLMo, Molmo, Tulu, and more. We’re hiring at all levels—apply now! 👇 #AI #Hiring Research Engineer job-boards.greenhouse.io/thealleninstit… Research Scientist job-boards.greenhouse.io/thealleninstit… Young Investigator job-boards.greenhouse.io/thealleninstit…
English
2
16
61
57.2K
Javier de la Rosa @[email protected] 리트윗함
Manu Romero
Manu Romero@mrm8488·
🚀 We're Hiring Applied AI Engineers! 🚀 Do you write clean, efficient Python? Are you familiar with AI frameworks? Do you thrive in a collaborative team? If that sounds like you, DM me now! Let's build the future of AI together. 💡🤖
English
0
6
30
1.8K
Javier de la Rosa @versae@mastodon.social
@maballesterosv En general, sí. Pero depende de la abilidad concreta que se le espera un LLM hoy en día. La mayoría de los modelos son entrenamientos base (pre-training), sin capacidad para seguir instrucciones o los diálogos (post-training).
Español
0
0
1
45
Mike Ballesteros
Mike Ballesteros@maballesterosv·
@versae Súper interesante el trabajo, Javier. Enhorabuena. Si no he entendido mal, es la calidad del material 2editado" (riqueza lingüística, coherencia y rigor) lo que obra la magia, ¿verdad?
Español
1
0
1
31