Shayne Longpre

2.3K posts

Shayne Longpre

@ShayneRedford

Lead the Data Provenance Initiative. PhD @MIT. 🇨🇦 Prev: @Google Brain, Apple, Stanford. AI/ML/NLP

Boston Beigetreten Şubat 2015

1.3K Folgt5.9K Follower

Angehefteter Tweet

Shayne Longpre@ShayneRedford·26 Kas

Who is winning the open AI race? Our new study "Economies of Open Intelligence" maps 2.2B @huggingface downloads across 851k models (2020→2025). 1) Power is rebalancing (US big tech ↓; China + community ↑) 2) Models got big & efficient (MoE, quant, multimodal surge) 3) Intermediaries now matter (adapters/quantizers steer usage) 4) Transparency is slipping /🧵

English

28.8K

Shayne Longpre retweetet

Enrico Shippole@EnricoShippole·2d

We @TeraflopAI have worked together with @johngfriedman and @daftengine to open-sourced all major filings from SEC EDGAR completely for free on @huggingface. It is now more important than ever to push for open dataset releases.

TeraflopAI@TeraflopAI

Given the increasingly closed-source nature of the U.S. AI ecosystem, it is now more important than ever to push for the proliferation of open model and dataset releases. Datamule (@johngfriedman), @TeraflopAI, and @daftengine collaborated to release 43 Billion Tokens of SEC EDGAR data.

English

28.5K

Shayne Longpre@ShayneRedford·2d

📜: arxiv.org/pdf/2512.03073 Live dashboard: huggingface.co/spaces/economi… (courtesy of @emsesc) AI Index 2026 Chpt. 1: hai.stanford.edu/assets/files/a…

English

225

Shayne Longpre@ShayneRedford·2d

Excited to see our Economies of Open Intelligence work highlighted in Chp. 1 of @StanfordHAI's #AIIndex2026! We release tons of info on the open model ecosystem, using 🤗 HF data. Thank you @russellwald and team!

English

522

Shayne Longpre retweetet

Yong Zheng-Xin@yong_zhengxin·6 Nis

🚨New paper! How safe and aligned is Kimi K2.5? We found concerning dual-use capabilities, sabotage and self-replication tendencies, political censorship on Chinese-language queries, and potential agentic misuse risks. (1/N)

English

20.7K

Shayne Longpre retweetet

Hamidah Oderinwale@didaoh·20 Mar

Wrote a new essay with @AbramovichShira for @reboot_hq on procedural data extraction, consumer platforms, what it means for privacy, and the parallels to the attention economy! Cover art is a h/t to Daniel Dennett's "Cartesian theater" by @connie_surf :)

English

681

Shayne Longpre retweetet

Shannon Shen@shannonzshen·2 Mar

Check out our latest @augmind_fm release! It's a privilege to have such an interesting conversation with @tongshuangwu! I learned so much from her insights in both specific projects and general research guidance — I've kept quoting her in recent chats with friends. I love many parts of our conversation, but in particular the following quotes — She articulated so many profound thoughts with such clarity: “To think about really impactful research is to 𝐫𝐞𝐭𝐡𝐢𝐧𝐤 𝐭𝐡𝐞 𝐚𝐬𝐬𝐮𝐦𝐩𝐭𝐢𝐨𝐧𝐬 𝐦𝐚𝐝𝐞 𝐛𝐲 𝐭𝐡𝐞 𝐜𝐨𝐦𝐦𝐮𝐧𝐢𝐭𝐲 and try to challenge those assumptions. If everyone feels like things should happen in this way and no one questions it, question it and see if it actually brings something interesting." — This couldn't resonate more in an era when everyone feels exhausted by constant AI updates: there are still many questions worth asking and waiting to be discovered. This is such a grounded answer to Steve Jobs's famous mantra "Think Different." "Even for the research I am doing right now, it's either human-centered AI or AI-centered human [...]. But when I think about it, 𝐡𝐮𝐦𝐚𝐧𝐬 𝐚𝐧𝐝 𝐀𝐈, 𝐢𝐭'𝐬 𝐯𝐞𝐫𝐲 𝐡𝐚𝐫𝐝 𝐭𝐨 𝐬𝐞𝐩𝐚𝐫𝐚𝐭𝐞 𝐭𝐡𝐞𝐦. 𝐈 𝐝𝐨 𝐭𝐡𝐢𝐧𝐤 𝐭𝐡𝐞𝐲 𝐜𝐨-𝐞𝐯𝐨𝐥𝐯𝐞. [...] How do we actually study them together. [...] that is definitely a field that, I think, would become even more interesting in the next few years." — Studying intelligence is looking into a mirror of ourselves, and this becomes ever more true as the models get better. The emphasis on human-centeredness is not about sacrificing technical rigor but rather looking beyond the surface of intelligence to truly understand us. There's so much more packed in this conversation. Give it a listen and hope you'll enjoy it as much as I did!

Sherry Tongshuang Wu@tongshuangwu

I'm not brave enough to watch myself on camera🫣, but @shannonzshen is a great interviewer and I remember us having really interesting discussions! Annnd we made sure to feature CMU’s Scotty in the scene so don’t miss it!...🐶

English

2.8K

Shayne Longpre retweetet

Matthew Leavitt@leavittron·19 Şub

Two nursing home residents are eating lunch. One says, "Boy, the food at this place is terrible." The other says, "Yeah, I know, and such small portions, too." This is the multilingual data problem. The data is bad, AND there's not enough of it. Yesterday at @datologyai we released ÜberWeb: our study of multilingual curation that gets 4-10x train FLOPs improvements on multilingual benchmarks compared to strong public baselines like Qwen3-1.7B and Tiny Aya Base.

English

3.6K

Shayne Longpre retweetet

Lossfunk@lossfunk·18 Şub

🚨 Shocking: The quality of response you get from the LLM depends on the language you use! Our new paper reveals how LLMs entangle language with culture, leading to culturally different responses purely based on the language of the query 👇 Accepted at LM4UC, AAAI!

GIF

English

152

26.2K

Shayne Longpre@ShayneRedford·4 Şub

This is such an ambitious and necessary pursuit. Excited to see this incredible team take it on!

Sara Hooker@sarahookr

Beginnings are very special. Today is an important day for @adaptionlabs. Today a handful of one-size-fits-all-models are optimized for the average use case. Averages erase the exceptional. Everything intelligent adapts. So should AI.

English

1.5K

Shayne Longpre retweetet

adaption@adaption_ai·4 Şub

Adaption has raised $50M to build adaptive AI systems that evolve in real time. Everything intelligent adapts. So should AI.

English

194

160

1.6K

193.9K

Shayne Longpre@ShayneRedford·28 Oca

x.com/GoogleResearch…

Google Research@GoogleResearch

Introducing ATLAS: New scaling laws for massively multilingual language models. We offer practical, data-driven guidance to balance data mix and model size, helping global developers better serve billions of non-English speakers. Learn more: goo.gle/49WYLL0

ZXX

343

Shayne Longpre@ShayneRedford·28 Oca

See the full TLDR here: x.com/ShayneRedford/…

Shayne Longpre@ShayneRedford

📢Thrilled to introduce ATLAS 🗺️: scaling laws beyond English, for pretraining, finetuning, and the curse of multilinguality. The largest public, multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer: 🌍Are scaling laws different by language? 🧙‍♂️Can we model the curse of multilinguality? ⚖️Pretrain from scratch or finetune from multilingual checkpoint? 🔀Cross-lingual transfer scores for 1444 lang pairs? 1/🧵

English

401

Shayne Longpre@ShayneRedford·28 Oca

We just released the Google Research Blog for ATLAS 🗺️! Check out for: 1) Multilingual scaling and data mixing laws for 100s of languages 2) "Curse of Multilinguality" modeling 3) Cross-lingual transfer scores 🌎 research.google/blog/atlas-pra…

English

629

Shayne Longpre retweetet

Google Research@GoogleResearch·27 Oca

English

205

1.4K

89.3K

Shayne Longpre retweetet

Niloofar@niloofar_mire·19 Oca

Finally wrote up a blogpost on my surviving (and maybe thriving?) on the academic job market! stuff people don't usually talk about: routines, food, and how to do 10 back-to-back 1:1s without your brain turning to mush. Also why I always had broccoli in my bag lol Link ⬇️

English

472

78.9K

Shayne Longpre retweetet

Ahmed Ahmed@AhmedSQRD·10 Oca

Excited that the atlantic featured our work in a new article!

The Atlantic@TheAtlantic

New research presents the most compelling evidence yet that generative AI directly stores and reproduces material used to train it—a finding that could have massive legal consequences for the tech industry, Alex Reisner reports. theatlantic.com/technology/202…

English

3.6K

Shayne Longpre retweetet

Ahmed Ahmed@AhmedSQRD·7 Oca

1/🧵 We prompted production LLMs with a short prefix of a book and asked them to complete the rest. How much of the book did they return? For Harry Potter and the Sorcerer’s Stone: (jailbroken) Claude 3.7 Sonnet→95.8%, GPT-4.1→4.0% (not jailbroken) Gemini 2.5 Pro→76.8%, Grok 3→70.3% Read on more details:

English

323

76.6K

Shayne Longpre retweetet

Cohere Labs@Cohere_Labs·11 Ara

Great teams form when we widen the search. 🌍 @ShayneRedford reminds us that the right collaborators aren’t defined by geography or seniority— they emerge when we look across disciplines and along the full spectrum of experience. Watch this full Keynote Presentation from the Connect Conference: youtu.be/b0ydOb6e_T0

YouTube

English

1.3K

Shayne Longpre retweetet

rishi@RishiBommasani·9 Ara

How transparent are major AI companies? We answer this question each year in the annual Foundation Model Transparency Index. While the AI industry as a whole is quite opaque, we found a huge spread. @IBM scored a 95/100 while @xai scored 14/100. So what's going on? 🧵

English

59.1K

Shayne Longpre retweetet

MMitchell@mmitchell_ai·9 Ara

Open-[source/weights/science] influences much of AI’s uptake, yet the dynamics are being overlooked. “Leadership in [AI] is not fixed and can be reshaped within a single model generation.” Nice piece from my colleague @frimelle and @ShayneRedford. techpolicy.press/policymakers-o…

English

2.6K

Entdecken

@TeraflopAI @johngfriedman @daftengine @huggingface @emsesc @StanfordHAI @russellwald @AbramovichShira