

Avijit Ghosh
5.6K posts

@evijit
Technical AI Policy Researcher @huggingface 🤗 . Current focus: Responsible AI, AI for Science, and @evaluatingevals!



If you know me, you’ve heard me talk about this story for months. @HeraRizwan reported from 3 states. We obsessed over every detail. Google's AI, designed for phones, is now rationing food to pregnant women. Read. Get angry. Share boomlive.in/decode/ai-faci… @pulitzercenter

We just submitted APEX-Agents, APEX-1 and ACE to @evaluatingevals on @huggingface, an OSS initiative to standardize evals and try to reduce the noise in benchmarking.


I’ve been at a small conference this week, one where the AI people have been presenting early in the week and the domain science people will be presenting later in the week. At the end of the talks last night, the conversation turned very doomer with all the AI people talking about how well Claude Code or Codex can do hill-climbing AI research and how we (the AI people) are maybe all about to lose our jobs! The domain science people expressed their shock at this attitude because, though Claude Code can be let loose to complete lots of banal hill-climbing AI research projects, basically no experimental science is hill-climbing or even metric driven. Most scientific fields are about much more taste-driven exploration that is incredibly difficult to make metrics for or to parameterize, and this misunderstanding from the AI community is one of the most damaging things to the realization of great science with AI. Seems like we’re actually pretty far from having AI models do that… Over the summer, @evijit and I wrote about this (and some other things hindering AI for science) at a bit more length, and today that work is out in Patterns! So, if you care about these problems and the real challenges in bringing AI to science in the real work, I recommend giving it a read!

3 days left! 📷 Writing, wrote, or just submitted a paper? Commit it to the EvalEval workshop at ACL 2026 in San Diego! evalevalai.com/events/2026-ac… (including ARR Submissions, non-archival, positions, and extended abstracts!) Submission Deadline: March 19th, 2026 AoE

AI watermarking in action at #ICML's avant garde peer-review experiments this year! Quite a few casualties in my SAC batch (an example below --- appropriately redacted hopefully)



I want to share a bit more about my vision for the Economic Research team at Anthropic in the coming years. This is a forward-looking vision. Some pieces we’ve yet to develop. Aspects of this work will surely change. Consider joining the effort. 1/6 #heading=h.j1ij8p6h22u5" target="_blank" rel="nofollow noopener">docs.google.com/document/d/1OM…



I’ve been at a small conference this week, one where the AI people have been presenting early in the week and the domain science people will be presenting later in the week. At the end of the talks last night, the conversation turned very doomer with all the AI people talking about how well Claude Code or Codex can do hill-climbing AI research and how we (the AI people) are maybe all about to lose our jobs! The domain science people expressed their shock at this attitude because, though Claude Code can be let loose to complete lots of banal hill-climbing AI research projects, basically no experimental science is hill-climbing or even metric driven. Most scientific fields are about much more taste-driven exploration that is incredibly difficult to make metrics for or to parameterize, and this misunderstanding from the AI community is one of the most damaging things to the realization of great science with AI. Seems like we’re actually pretty far from having AI models do that… Over the summer, @evijit and I wrote about this (and some other things hindering AI for science) at a bit more length, and today that work is out in Patterns! So, if you care about these problems and the real challenges in bringing AI to science in the real work, I recommend giving it a read!


In my new quest to train as a plumber-one of the most coveted jobs now, I' m creating plumbing videos & lessons using @NotebookLM. Here is an amazing short video! Turns out to be more interesting than I thought! Thanks to @GeminiApp, we are making plumbing great again (MPGA)!😅

@simonegiertz made a chair you can dedicate your laundry, and I love the design. she saw a common problem then brought a solution. what do you think?

“You guys are overhyping this” “Yes we can cure cancer and do regularly this way” “Yes the primary obstacles are regulatory/liability” uh

This is wild. theaustralian.com.au/business/techn…


Eyy @StepFun_ai released the dataset huggingface.co/datasets/stepf… :)

Anthropic shipped generative UI for Claude. I reverse-engineered how it works and rebuilt it for PI. Extracted the full design system from a conversation export. Live streaming HTML into native macOS windows via morphdom DOM diffing. Article: michaellivs.com/blog/reverse-e… Repo: github.com/Michaelliv/pi-… Built on @badlogicgames's pi and @DanielGri's Glimpse.

Starting today, Claude no longer defaults to text. Claude is learning to choose the best medium for each response — based on the task, the data, and what's most useful for the person. Give it a try!