dan bateyko

1.5K posts

dan bateyko

@dbateyko

maybe the hard stuff's inside, hidden — like bones, as opposed to an exoskeleton. @CornellInfoSci

NY เข้าร่วม Ocak 2012

2.4K กำลังติดตาม1.3K ผู้ติดตาม

dan bateyko รีทวีตแล้ว

sev field@sevdeawesome·3h

An interesting epistemic divide became clearer: researchers at leading companies (e.g. OpenAI, Anthropic, Google Deepmind) discuss recursive improvement more regularly and are encouraged by leadership. Academics were more likely to consider it professionally risky, one said raising these ideas risks being seen as "a crackpot."

English

3.3K

dan bateyko รีทวีตแล้ว

Ian Arawjo@IanArawjo·3d

Position paper: "Science of AI Evaluation Requires Item-level Benchmark Data": "broader access to and analysis of item-level AI benchmark data are essential for establishing a more scientifically grounded, evidence-centered approach to AI evaluation" arxiv.org/abs/2604.03244

English

2.2K

dan bateyko@dbateyko·4d

Lovely talk on bottlenecks to imaginative design work. Hadn't heard of these augmented reading systems. I liked the Scim idea (highlighting passages by their rhetorical role) so much that I asked Claude to remake it for Zotero, with a 2B LLM swapped in. andymatuschak.org/tat/refs/Fok20…

Andy Matuschak@andy_matuschak

⭐ New talk! andymatuschak.org/tat Coding agents might help us finally break out of two cages: the app model, which traps computing in one-size-fits-all silos; and programming as a specialization, which has crowded out cultures of imagination and domain insight.

English

173

dan bateyko@dbateyko·12 Nis

I'll worry about Mythos once it starts collecting Knuth reward checks en.wikipedia.org/wiki/Knuth_rew…

English

dan bateyko@dbateyko·8 Nis

Do you get pleasure out of being part of the system? System. When you're not performing your duties do they keep you scrolling reels? System. Let’s move on. Slop. Noguchi, Hay Matin, Snoopy, Cesca. Slop. Books on the floor? Slop. Your most shameful moment? Slop. We're done.

Good Girl Gone Mad 🧜‍♀️@GGGoneMad

The most provocative part of THE DRAMA is the concept of a 35yo being a curatorial chair (?) who makes enough money with his librarian (?) fiancée to furnish the most stunning apartment in all of Boston (?!)

English

234

dan bateyko รีทวีตแล้ว

Neel Guha@NeelGuha·7 Nis

I built a leaderboard tracking LLM performance on a suite of academic legal benchmarks. This includes LegalBench, LEXAm, Housing QA, BarExam, and some Hallucination benchmarks. Some fun findings:

English

6.9K

dan bateyko รีทวีตแล้ว

Ian Arawjo@IanArawjo·7 Nis

Stats for Evals is now live, and we got a site, too: statsforevals.com We'll be posting regular investigations across the summer. For now, we're starting with the basics: comparing models and prompts. Also has resources, principles, example code, and guidance for others:

English

1.3K

dan bateyko รีทวีตแล้ว

AI4Law@ICML@ai4law_workshop·6 Nis

🚨 Call for Papers: AI for Law Workshop @icmlconf (July 10, Seoul 🇰🇷), welcomes submissions across three themes: ⚖️ AI for Legal Reasoning 📊 AI Evaluation for Law 🌍 AI for Access to Justice ⏰ Deadline: May 22 (AoE) 📄Full paper submission via: openreview.net/group?id=ICML.…

English

4.8K

dan bateyko รีทวีตแล้ว

Kevin Wei@kevinlwei·6 Nis

I'll be a mentor for the @cbai_ai Summer Research Fellowship in AI Safety this year! If you're interested in evaluations/science of evaluations, legal alignment, or other technical AI governance topics, apply for my workstream by April 12: cbai.ai/summer-researc…

Cambridge Boston Alignment Initiative@cbai_ai

Applications are open for the CBAI Summer Research Fellowship in AI Safety, a fully funded, in-person research program in Harvard Square for those interested in starting a career in AI alignment or governance research. You'll receive a stipend of $10k, housing, 24/7 office access, up to $10k in compute, and direct mentorship from some of the best minds in the field.

English

153

19.8K

dan bateyko รีทวีตแล้ว

Lama Ahmad لمى احمد@_lamaahmad·6 Nis

Very excited about the launch of the OpenAI Safety Fellowship - read more and apply by May 3rd!

OpenAI@OpenAI

Introducing the OpenAI Safety Fellowship, a new program supporting independent research on AI safety and alignment—and the next generation of talent. openai.com/index/introduc…

English

13.4K

dan bateyko รีทวีตแล้ว

Peter Henderson@PeterHndrsn·29 Mar

Btw, did a bit of a rebranding of the substack. Will endeavor to post more there. h/t @dbateyko on the Trials & Errors name. Super fitting name for a group whose focus is both in reinforcement learning and in law/governance research. trialserrors.ai

Peter Henderson@PeterHndrsn

This is a challenging legal problem for NeurIPS (and other conference participants)! You might be wondering how this is possible given the First Amendment? I wrote a quick explainer on the current status quo of relevant First Amendment cases & law to get you up to speed. 🔗👇

English

1.3K

dan bateyko@dbateyko·28 Mar

Wild Anna’s Archive bounty, reads like a heist. They want someone to front tens of thousands of dollars to buy Library of Congress files for a 3k bounty. I imagine a leak investigation would have a very short suspect list

English

144

dan bateyko@dbateyko·28 Mar

@jasminewsun @TheAtlantic Congratulations!!

English

jasmine sun@jasminewsun·27 Mar

Personal news: I’m joining @TheAtlantic as a contributing writer! It drives me nuts how wide of an understanding gap there is between SF AI world and everywhere else — especially given the immense public stakes. There's so much AI hype, anxiety, and misinformation; so doing translation and synthesis feels more important than ever. (This role is in addition to Subst*ck, where I’ll keep writing at the same cadence.) I'm using this excuse to share some rambly media thoughts: namely that tech journalism can & must be great again. The problem with “old media” is that it often refuses to take tech bros at their word, and the problem with “new media” is that it’s often just advertising, which is boring even for the subjects. There’s a doom loop where some reporters write poorly-informed stories, so insiders won’t talk to them, so sourcing is worse; not to mention that most journalists are not based in the communities they cover. This makes people bad-faith, but it also means a lot of AI reporting is 6-12 months behind. Yes, fantastic blogs/podcasts abound — these are the bulk of my info diet — but they are largely insiders talking to insiders, too niche to recommend to policymakers or smart non-AI friends. These fractures are a disaster for shared public knowledge, and make us less prepared to navigate AI well. Magazine writing offers the ability to rise above of the hourly play-by-play (squinting at every new model release, every new jobs report) and to the bigger questions. I actually think the most impactful AI writing has *months*, not days of longevity! Rather than over-anchoring to any particular forecast, it offers generalized frames for operating under uncertainty. A few types of pieces I’m especially keen to write: 1) AI culture: A few people’s idiosyncratic personal beliefs regularly change the world. It thus matters tremendously how AI builders view their work, politics, philosophy, and the future. I think most individuals in the AI industry are good and want their tech to do good. Journalists can portray AI workers’ earnest beliefs while being appropriately skeptical of how that can clash with or be shaped by industry incentives, and how it might diverge from the public. "Smart people confront hard moral/intellectual problem" is one of my favorite genres. 2) AI diffusion: AI discourse disproportionately focuses on its impact on software and writing because those are the jobs the messengers do (obviously I’m guilty of this). That makes me want to do more field reporting on AI in education, manufacturing, healthcare, etc: e.g. can I ride along with a team trying to integrate AI tutors into a school? Diffusion is rarely as smooth as economic models predict, and “how AI will go” depends largely on the speed, and where it hits first. Relatedly: AI in the non-western world. 3) AI superusers: Polls show people are highly anxious about AI’s speculative effects but sanguine about their personal use. I think more people should experiment with AI to feel both the pace of progress *and* its jagged edges. While AI can produce slop/surveillance/etc, it can also extend human ability & creativity. I want to paint portraits of people already “living in the future" so we can ask: is that a life we want? The tech is here, but we can choose how to relate to it. If you have ideas/feedback/etc my DMs are open, and my Signal is jws.27. For me 1-1 conversations are *not* on the record unless we say so. (I always thought this was a weird norm, and in general am happy to answer people's questions about “how journalism works” from my POV because it can be quite opaque.) (also I'm replacing my blurry macbook selfie with a b&w portrait profile picture to signify reluctant induction into the label of "capital-j Journalist.” I spent most of last year pretending to be funemployed, but I suppose this is graduation. end of an era!)

English

149

1.4K

131.4K

dan bateyko@dbateyko·28 Mar

@GaoShanghua Nice! I’ve already started to play around with it using open models, congrats on this work

English

Shanghua Gao@GaoShanghua·28 Mar

@dbateyko That’s a good question! We don’t see a clear pattern of what kinds of questions are not covered. And we do observe that with more rounds of expansion in our method or using better base LLMs, the coverage is getting better.

English

Shanghua Gao@GaoShanghua·26 Mar

Are we even measuring the right things when we evaluate LLMs? We introduce QWorld, a framework where every question generates its own evaluation world through recursive expansion tree. One question becomes 45+ fine-grained criteria. On HealthBench alone: 200k+ criteria across 530+ dimensions. 79% of QWorld's criteria are entirely novel. No expert had ever written them down, yet human judges validate they matter. It surfaces blind spots in every frontier model: sustainability, equity, emergency recognition. Dimensions standard benchmarks don't even have. Built with @YuchangSu456733, @sui67713, @CurtGinder, and @marinkazitnik Paper: arxiv.org/abs/2603.23522 Code: github.com/mims-harvard/q… Demo: qworld.openscientist.ai @Harvard @HarvardDBMI @KempnerInst @harvardmed

English

5.7K

dan bateyko รีทวีตแล้ว

Kenny Peng@kennylpeng·26 Mar

Excited to share our new research demo. We trained an SAE on 28M social media posts to generate 20K browsable trails, spanning “analysis of fictional tropes” to “rotisserie chicken” to “zoning and land use policy.” Try it out! skytrails.org

English

450

dan bateyko รีทวีตแล้ว

Neel Guha@NeelGuha·25 Mar

I wrote a blogpost about writing machine learning research papers (e.g., NeurIPS, ICML, ICLR, etc.). The core idea is that most papers follow one of a predetermined set of templates. The post talks about each template, describes their rules, and offers examples...

English

622

79.3K

dan bateyko รีทวีตแล้ว

Institute for Law & AI (LawAI)@law_ai_·23 Mar

We’re excited to announce the second annual Summer Institute on Law and AI — a gathering of law students, professionals, and academics eager to explore pressing issues at the intersection of AI, law, and policy.

English

954

dan bateyko รีทวีตแล้ว

Natasha Jaques@natashajaques·20 Mar

Why am I obsessed with this? LLMs do not preserve our intentions or diversity of thought in writing, and they’re already being adopted en masse. More than 1 billion people worldwide use them on a weekly basis. Existing work has shown that for individual scientists, using LLMs to generate papers increases your productivity and impact, even though it constricts science’s overall focus. In our study we show that even though participants who rely on LLMs say their writing is significantly less creative and not in their voice, they are paradoxically equally satisfied with the output. So, the adoption of LLMs is not going to slow any time soon. But it’s already affecting our cultural institutions and the way we conduct science. We urgently need more research into how massive, widespread LLM adoption will affect our science, politics, and culture.

English

135

17K

dan bateyko รีทวีตแล้ว

maxwell neely-cohen@maxnc·20 Mar

Issue Five of The HTML Review is here! Get to your browser for 15 excellent new pieces of web-based literature and art

English

316

2.5K

278.7K

ค้นพบ

@icmlconf @cbai_ai @jasminewsun @TheAtlantic @GaoShanghua @YuchangSu456733 @sui67713 @CurtGinder