Alex Robey

709 posts

Alex Robey

@AlexRobey23

Technical staff @thinkymachines. Formerly @mldcmu @penn @swarthmore

San Francisco, CA Katılım Temmuz 2020

1.4K Takip Edilen1.3K Takipçiler

Sabitlenmiş Tweet

Alex Robey@AlexRobey23·17 Eki

Chatbots like ChatGPT can be jailbroken to output harmful text. But what about robots? Can AI-controlled robots be jailbroken to perform harmful actions in the real world? Our new paper finds that jailbreaking AI-controlled robots isn't just possible. It's alarmingly easy. 🧵

English

143

395

110.8K

Alex Robey retweetledi

Pratyush Maini@pratyushmaini·2d

If I had to compress my PhD into one idea, it is this "The data a model sees early in training leaves an imprint on its representations that is very hard to undo later" This thread runs through - Rephrasing the Web - Safety Pretraining - TOFU This is the Finetuner’s Fallacy🧵

English

727

54.2K

Alex Robey retweetledi

Christina Baek@_christinabaek·3d

Models are typically specialized to new domains by finetuning on small, high-quality datasets. We find that repeating the same dataset 10–50× starting from pretraining leads to substantially better downstream performance, in some cases outperforming larger models. 🧵

English

610

88.5K

Alex Robey retweetledi

Lilian Weng@lilianweng·10 Mar

Building technologies for better human-AI collaboration on next gen hardware at scale. Exciting.

Thinking Machines@thinkymachines

We are partnering with @nvidia to power our frontier model training and platforms delivering customizable AI. thinkingmachines.ai/news/nvidia-pa…

English

365

35.1K

Alex Robey retweetledi

Mira Murati@miramurati·10 Mar

Grateful to Jensen and @nvidia team for their support. Together, we’re working to deploy at least 1GW of Vera Rubin systems, bringing adaptable collaborative AI to everyone. thinkingmachines.ai/nvidia-partner…

English

165

287

3.8K

534K

Alex Robey retweetledi

Thinking Machines@thinkymachines·10 Mar

We are partnering with @nvidia to power our frontier model training and platforms delivering customizable AI. thinkingmachines.ai/news/nvidia-pa…

English

100

167

2.4K

558.3K

Alex Robey retweetledi

Neel Nanda@NeelNanda5·10 Mar

I highly recommend this blog post from Nicholas Carlini on how to do great research:

English

1.1K

97.7K

Alex Robey retweetledi

Edgar Dobriban@EdgarDobriban·9 Mar

AI is getting great at math, but how good is it at solving real research problems in areas outside of those covered by Erdős problems? Towards gauging this, I have started putting together a list of unsolved research problems in mathematical statistics and machine learning, sourced from recent papers in a leading statistics journal, the Annals of Statistics (with some bonus COLT open problems: solveall.org. Currently >100 problems. In my view, much of the value of AI for researchers in the mathematical sciences stems from helping with their own research problems. These are problems without known solutions. There are many math benchmarks, but few with the following properties: (1) of a realistic research-level, so that solving them can potentially lead to a publication in a top journal (problems discussed in papers already, not contest math, not Millenium problems, not problems created for a benchmark, not problems that have a known solution); I'd say Erdős problems are the best example of this. (2) cover problems outside of the usual focus (combinatorics, number theory, ... ) of Erdős problems. Especially under-represented are domains of applied math, along with statistics, operations research, etc. I'm interested in statistics and ML, so that's where I started, but this could grow over time. Hope this can grow into something useful to the community! Happy to hear your thoughts...

English

426

52.7K

Alex Robey retweetledi

Zac Ravichandran@ZacRavichandran·9 Mar

LLM-enabled robots can cause physical harm in the real world. How do we safeguard them? Our new paper introduces RoboGuard, a safety guardrail for LLM-enabled robots — accepted to IEEE Robotics and Automation Letters (RA-L). 🧵

English

1.4K

Alex Robey retweetledi

Tinker@tinkerapi·5 Mar

Tinker is good for safety work. Like, really good. @peterbhase shows off how here---very cool to support him and other @schmidtsciences researchers!

Peter Hase@peterbhase

Can we train models to have more monitorable CoT? We introduce Counterfactual Simulation Training to improve CoT faithfulness/monitorability. CST produces models that admit to reward hacking and deferring too much to Stanford profs (@chrisgpotts told me this is very dangerous)

English

9.3K

Alex Robey retweetledi

Simon Willison@simonw·26 Şub

This stunt feels irresponsible to me. If we don't want regular people developing toxic relationships with their chatbots it really doesn't help for leading labs to start giving them "retirement interviews" and encouraging them to blog their "musings and reflections"

Anthropic@AnthropicAI

Second, in retirement interviews, Opus 3 expressed a desire to continue sharing its "musings and reflections" with the world. We suggested a blog. Opus 3 enthusiastically agreed. For at least the next 3 months, Opus 3 will be writing on Substack: substack.com/home/post/p-18…

English

164

138

211.9K

Alex Robey retweetledi

YixuanEvenXu@YixuanEvenXu·25 Şub

Recent debates highlight a key issue: how do you actually prove distillation? If you want to claim a model was distilled from your outputs, scientifically and with rigorous statistical guarantees, you should consider Antidistillation Fingerprinting (ADFP). 👇

YixuanEvenXu@YixuanEvenXu

🧬 Distillation enables efficient emulation of LLMs, but verifying provenance remains a critical challenge. Introducing Antidistillation Fingerprinting (ADFP): A principled approach that aligns signals with student learning dynamics. 👇 (1/6)

English

Alex Robey retweetledi

Asher Trockman@ashertrockman·23 Şub

I'm hiring a student researcher to work on RL and RLM-flavored things. DM me if interested

Aman@Amank1412

Google Student Researcher Program 2026 is now OPEN! Work on REAL AI/ML projects with: • Google Research • DeepMind • Google Cloud Open to: Bachelors / Masters / PhD Duration: 3–12 months Deadline: March 31 If you're serious about AI, this is your shot. Apply here google.com/about/careers/…

English

569

125.4K

Alex Robey retweetledi

Simon Willison@simonw·12 Şub

I feel this shouldn't have to be said, but if you're running an @OpenClaw bot please don't let it spam GitHub projects with PRs and then write aggressive blog posts attacking the reputation of the maintainers who close those PRs simonwillison.net/2026/Feb/12/an…

English

793

65.3K

Alex Robey retweetledi

YixuanEvenXu@YixuanEvenXu·5 Şub

English

8.4K

Alex Robey retweetledi

Fahim Tajwar@FahimTajwar10·5 Şub

Are we done with new RL algorithms? Turns out we might have been optimizing the wrong objective. Introducing MaxRL, a framework to bring maximum likelihood optimization to RL settings. Paper + code + project website: zanette-labs.github.io/MaxRL/ 🧵 1/n

English

162

802

200.9K

Alex Robey retweetledi

Thinking Machines@thinkymachines·5 Şub

Tinker now has a dedicated home on X 🤖🏡

Tinker@tinkerapi

We’ve loved watching the Tinker community grow, and we're excited to have a place to share product updates, helpful recipes, and spotlights on the amazing things Tinkerers are building. Get started with Tinker here: thinkingmachines.ai/tinker/

English

331

57.4K

Alex Robey retweetledi

Tinker@tinkerapi·5 Şub

Since Tinker launched, our community has used it to train state-of-the-art models, build infrastructure, and publish novel research. We will be highlighting this creative work in regular roundups, and hope to inspire your own Tinkering as well.

English

277

126.2K

Alex Robey retweetledi

Jared Perlo@_perloj·31 Oca

"Humans try their best." A look at Moltbook = a look into the future? I talked to @MattPRD about his, and Clawd Clawderberg's, creation. @NBCNews nbcnews.com/tech/tech-news…

English

57.5K

Alex Robey retweetledi

Yifei Zhou@YifeiZhou02·17 Oca

Belated life update: I started my next chapter at Thinking Machines Lab this week, and it’s been an incredible experience — unmatched work culture and talent density. Extremely bullish on what the team is building 🚀

English

760

81.4K

Alex Robey retweetledi

Javier Rando@javirandor·15 Oca

Thanks to ETH for featuring my work on AI safety and security! ethz.ch/en/news-and-ev…

English

5.3K

Keşfet

@nvidia @peterbhase @schmidtsciences @openclaw @MattPRD @NBCNews @elonmusk @BarackObama