Natasha Jaques (@natashajaques) - Twitter Profili

Sabitlenmiş Tweet

Natasha Jaques@natashajaques·6 Kas

Honoured to be named as a Schmidt AI2050 Fellow, and excited to continue working towards safe AI that effectively supports people, rather than supplants them.

Schmidt Sciences@schmidtsciences

We're excited to welcome 28 new AI2050 Fellows! This 4th cohort of researchers are pursuing projects that include building AI scientists, designing trustworthy models, and improving biological and medical research, among other areas. buff.ly/riGLyyj

English

15

7

171

32.7K

Natasha Jaques retweetledi

Khoa Vu@KhoaVuUmn·3d

"We used to spend hours trying to deciphering a simple LaTeX error."

English

18

301

3.7K

86.8K

Natasha Jaques retweetledi

Azmine Wasi @ICML2026@AzmineWasi·5 May

New review trend: 1. Ask for 100 new experiments, even though the paper is already sufficient. 2. Authors submit a strong rebuttal. 3. “Thanks for the rebuttal, but these additions would substantially change the work and require another round of review, so I’ll keep my score.”

English

2

3

74

8K

Natasha Jaques retweetledi

Josh Kale@JoshKale·22 Nis

Anthropic said Mythos was too dangerous to release. Then four random guys in a Discord gained access on day one by guessing the URL... This is pretty insane: → Group in a private Discord guessed the endpoint from Anthropic's naming conventions → They figured out the conventions from the leak in the Mercor breach three weeks ago → Used a contractor's legit eval credentials to walk in → Have been using it ever since to build simple websites The AI that finds zero-days in every operating system on earth was defeated by address bar autocomplete... big yikes

Bloomberg@business

Anthropic's Mythos has been accessed by a small group of unauthorized users, raising questions about control of the AI model bloomberg.com/news/articles/…

English

273

2.3K

24.5K

4M

Natasha Jaques@natashajaques·18 Nis

LLMs will supposedly solve climate change and cure cancer, but in fact they can't even do multi-turn reasoning tasks effectively (SOTA models are < 10% on this benchmark). Interestingly, this work directly compares how much extra performance you get when you add an agentic harness (figure 7): a lot for simple optimization problems, 0% for math and chemistry.

Sumeet Motwani@sumeetrm

We’re releasing LongCoT, an incredibly hard benchmark to measure long-horizon reasoning capabilities over tens to hundreds of thousands of tokens. LongCoT consists of 2.5K questions across chemistry, math, chess, logic, and computer science. Frontier models score less than 10%🧵

English

9

14

104

21.5K

Natasha Jaques@natashajaques·17 Nis

Exactly: arxiv.org/abs/2603.18161

Max Spero@max_spero_

I say this all the time, but a large reason why LLMs are detectable is they have preferences instilled into them through training data and RL. Asking a model to rewrite something gives the LLM an opportunity to apply its own preferences to your text! Sometimes the preferences are helpful, like proper grammar and spelling. But other times, it actively erases the author's intent and voice - softening language to bring statements closer to what the LLM is comfortable with (see below) - replacing the author's metaphors with the LLM's preferred metaphors - replacing the author's voice (tics, sentence structure, vocabulary choice) with a more "default" voice the LLM prefers

English

1

2

22

10K

Natasha Jaques retweetledi

Max Spero@max_spero_·17 Nis

I say this all the time, but a large reason why LLMs are detectable is they have preferences instilled into them through training data and RL. Asking a model to rewrite something gives the LLM an opportunity to apply its own preferences to your text! Sometimes the preferences are helpful, like proper grammar and spelling. But other times, it actively erases the author's intent and voice - softening language to bring statements closer to what the LLM is comfortable with (see below) - replacing the author's metaphors with the LLM's preferred metaphors - replacing the author's voice (tics, sentence structure, vocabulary choice) with a more "default" voice the LLM prefers

keysmashbandit@keysmashbandit

Please, I'm begging you, try to critically examine the differences between these two pieces of writing. ChatGPT editing did not improve this. Every single change only served to weaken your claims significantly. Everything is now hedged into oblivion: no longer have you outlined a "problem," now it's merely a "flaw." "It is true" now demoted to "it appears to be the case." "Is" gets a "usually" tacked on. A thesis statement at the end of the first paragraph gets run over by noisy, out-of-context example-whittling. All for fear of being misconstrued. And at the end, the argument that gets spat out isn't even yours anymore! You argued that Graeber failed to create a true account of work because he did not understand Chesterton's Fence. ChatGPT is arguing is that it is possible some apparently bullshit jobs could be secretly load-bearing if you squint. These are two different statements. The second is weaker and less compelling. It says less. And it's fucking longer! Don't do this anymore! Stop doing this! It's worse!!!

English

12

24

318

40.8K

Natasha Jaques retweetledi

Isadora White@isadorcw·6 Nis

Please. Let's not do this again. Call your representatives and tell them to support science!!

Jay Van Bavel, PhD@jayvanbavel

NEWS: Massive budget cuts for US science proposed again by Trump administration "It's an extinction-level event for science". The US government is proposing massive cuts to almost every branch of science, from NASA to the National Institutes of Health. NSF would completely eliminate the social, economic and behavioral sciences directorate. This would decimate the world's leading scientific system. nature.com/articles/d4158…

English

0

1

2

1.4K

Natasha Jaques retweetledi

Demis Hassabis@demishassabis·2 Nis

Excited to launch Gemma 4: the best open models in the world for their respective sizes. Available in 4 sizes that can be fine-tuned for your specific task: 31B dense for great raw performance, 26B MoE for low latency, and effective 2B & 4B for edge device use - happy building!

English

327

883

8K

986K

Natasha Jaques retweetledi

Claude@claudeai·24 Mar

You can now enable Claude to use your computer to complete tasks. It opens your apps, navigates your browser, fills in spreadsheets—anything you'd do sitting at your desk. Research preview in Claude Cowork and Claude Code, macOS only.

English

4.9K

14.5K

139.4K

77.9M

Natasha Jaques retweetledi

NeurIPS Conference@NeurIPSConf·27 Mar

We want to speak directly to the concern many of you have expressed, and we owe you a clear explanation of what happened, why it happened, and where we stand now. We understand this situation caused genuine alarm and we take that seriously. In preparing the NeurIPS 2026 handbook, we included a link to a US government sanctions tool that covers a significantly broader set of restrictions than those NeurIPS is actually required to follow. This error was due to miscommunication between the NeurIPS Foundation and our legal team; there was never an intention to restrict participation beyond our mandatory compliance obligations. The responsibility for that error is ours as an organization, and we deeply apologize for the alarm and impact this miscommunication had on our community. We have updated the link and clarified the text of our policy, which is consistent with that of ACM and IEEE, as well as other international conferences and NeurIPS in the past. As in previous years, NeurIPS welcomes submissions from all compliant institutions and individuals. We want to reiterate that NeurIPS is a community-driven event, created by and for the community, and strives to be inclusive. The NeurIPS 2026 organizing committee was particularly saddened to learn of this institutional miscommunication. The organizing committee has taken on the responsibility of running the conference this year with the goal of fostering open communication, knowledge sharing, and global scientific discourse. We thank the community for bringing this issue to our attention and working with us through this situation.

English

265

126

505

494.5K

Natasha Jaques@natashajaques·21 Mar

@DamienTeney We only analyzed papers that had at least one review generated by an LLM, and at least one generated by a person, to control for this possibility.

English

0

3

1.1K

Damien Teney@DamienTeney·21 Mar

@natashajaques What confounders did you control for in the analysis of ICLR reviews? I imagine that LLM reviews affected mostly a certain type of papers (eg topics of interest to heavy LLM users) for which empirical results/scalability are important. Can't find any info in the paper/appendix.

English

1

0

2

1.4K

Natasha Jaques@natashajaques·20 Mar

The paper I’ve been most obsessed with lately is finally out: nbcnews.com/tech/tech-news…! Check out this beautiful plot: it shows how much LLMs distort human writing when making edits, compared to how humans would revise the same content. We take a dataset of human-written essays from 2021, before the release of ChatGPT. We compare how people revise draft v1 -> v2 given expert feedback, with how an LLM revises the same v1 given the same feedback. This enables a counterfactual comparison: how much does the LLM alter the essay compared to what the human was originally intending to write? We find LLMs consistently induce massive distortions, even changing the actual meaning and conclusions argued for.

English

46

393

1.5K

256.1K

Natasha Jaques@natashajaques·21 Mar

@provisionalidea pangram.com/blog/pangram-p…

QME

0

1

324

James Rosen-Birch ⚖️🕊️@provisionalidea·20 Mar

@natashajaques What did you use to determine AI-generation? Afaik reliable detection is still an unsolved problem

English

2

0

1

366

Natasha Jaques retweetledi

Isadora White@isadorcw·20 Mar

🚨 Do you use LLMs to help you write? 🤔You might notice that the text that you write with LLMs "feels" like an LLM, but did you know that it is also changing what you intended to say? 🤯 That's what we find in our new paper 👇 (1/N)

Natasha Jaques@natashajaques

The paper I’ve been most obsessed with lately is finally out: nbcnews.com/tech/tech-news…! Check out this beautiful plot: it shows how much LLMs distort human writing when making edits, compared to how humans would revise the same content. We take a dataset of human-written essays from 2021, before the release of ChatGPT. We compare how people revise draft v1 -> v2 given expert feedback, with how an LLM revises the same v1 given the same feedback. This enables a counterfactual comparison: how much does the LLM alter the essay compared to what the human was originally intending to write? We find LLMs consistently induce massive distortions, even changing the actual meaning and conclusions argued for.

English

4

18

53

9.4K

Natasha Jaques retweetledi

Ash Paul@pash22·20 Mar

Using AI makes writing more bland, @natashajaques et al study finds: Teams from Google and leading universities found that large-language models change the voice, tone and intended meaning of human authors. nbcnews.com/tech/tech-news… via @_perloj

English

1

2

7

1.4K

Natasha Jaques@natashajaques·20 Mar

This is joint work with @marwaabdulhai @isadorcw @yanming_wan @jzl86 and @maxhkw. Project page: sites.google.com/view/llmwritin… Paper: arxiv.org/abs/2603.18161

English

13

16

119

9.7K

Natasha Jaques@natashajaques·20 Mar

Why am I obsessed with this? LLMs do not preserve our intentions or diversity of thought in writing, and they’re already being adopted en masse. More than 1 billion people worldwide use them on a weekly basis. Existing work has shown that for individual scientists, using LLMs to generate papers increases your productivity and impact, even though it constricts science’s overall focus. In our study we show that even though participants who rely on LLMs say their writing is significantly less creative and not in their voice, they are paradoxically equally satisfied with the output. So, the adoption of LLMs is not going to slow any time soon. But it’s already affecting our cultural institutions and the way we conduct science. We urgently need more research into how massive, widespread LLM adoption will affect our science, politics, and culture.

English

6

19

135

17.1K

Natasha Jaques

Keşfet