dan bateyko

1.5K posts

dan bateyko banner
dan bateyko

dan bateyko

@dbateyko

maybe the hard stuff's inside, hidden — like bones, as opposed to an exoskeleton. @CornellInfoSci

NY เข้าร่วม Ocak 2012
2.4K กำลังติดตาม1.3K ผู้ติดตาม
dan bateyko รีทวีตแล้ว
sev field
sev field@sevdeawesome·
An interesting epistemic divide became clearer: researchers at leading companies (e.g. OpenAI, Anthropic, Google Deepmind) discuss recursive improvement more regularly and are encouraged by leadership. Academics were more likely to consider it professionally risky, one said raising these ideas risks being seen as "a crackpot."
English
1
2
37
3.3K
dan bateyko รีทวีตแล้ว
Ian Arawjo
Ian Arawjo@IanArawjo·
Position paper: "Science of AI Evaluation Requires Item-level Benchmark Data": "broader access to and analysis of item-level AI benchmark data are essential for establishing a more scientifically grounded, evidence-centered approach to AI evaluation" arxiv.org/abs/2604.03244
English
1
4
20
2.2K
dan bateyko
dan bateyko@dbateyko·
Lovely talk on bottlenecks to imaginative design work. Hadn't heard of these augmented reading systems. I liked the Scim idea (highlighting passages by their rhetorical role) so much that I asked Claude to remake it for Zotero, with a 2B LLM swapped in. andymatuschak.org/tat/refs/Fok20…
dan bateyko tweet media
Andy Matuschak@andy_matuschak

⭐ New talk! andymatuschak.org/tat Coding agents might help us finally break out of two cages: the app model, which traps computing in one-size-fits-all silos; and programming as a specialization, which has crowded out cultures of imagination and domain insight.

English
0
0
3
173
dan bateyko
dan bateyko@dbateyko·
Do you get pleasure out of being part of the system? System. When you're not performing your duties do they keep you scrolling reels? System. Let’s move on. Slop. Noguchi, Hay Matin, Snoopy, Cesca. Slop. Books on the floor? Slop. Your most shameful moment? Slop. We're done.
dan bateyko tweet mediadan bateyko tweet mediadan bateyko tweet media
Good Girl Gone Mad 🧜‍♀️@GGGoneMad

The most provocative part of THE DRAMA is the concept of a 35yo being a curatorial chair (?) who makes enough money with his librarian (?) fiancée to furnish the most stunning apartment in all of Boston (?!)

English
0
0
2
234
dan bateyko รีทวีตแล้ว
Neel Guha
Neel Guha@NeelGuha·
I built a leaderboard tracking LLM performance on a suite of academic legal benchmarks. This includes LegalBench, LEXAm, Housing QA, BarExam, and some Hallucination benchmarks. Some fun findings:
Neel Guha tweet media
English
9
10
71
6.9K
dan bateyko รีทวีตแล้ว
Ian Arawjo
Ian Arawjo@IanArawjo·
Stats for Evals is now live, and we got a site, too: statsforevals.com We'll be posting regular investigations across the summer. For now, we're starting with the basics: comparing models and prompts. Also has resources, principles, example code, and guidance for others:
Ian Arawjo tweet media
English
1
8
22
1.3K
dan bateyko รีทวีตแล้ว
AI4Law@ICML
AI4Law@ICML@ai4law_workshop·
🚨 Call for Papers: AI for Law Workshop @icmlconf (July 10, Seoul 🇰🇷), welcomes submissions across three themes: ⚖️ AI for Legal Reasoning 📊 AI Evaluation for Law 🌍 AI for Access to Justice ⏰ Deadline: May 22 (AoE) 📄Full paper submission via: openreview.net/group?id=ICML.…
AI4Law@ICML tweet media
English
2
19
37
4.8K
dan bateyko รีทวีตแล้ว
dan bateyko รีทวีตแล้ว
Peter Henderson
Peter Henderson@PeterHndrsn·
Btw, did a bit of a rebranding of the substack. Will endeavor to post more there. h/t @dbateyko on the Trials & Errors name. Super fitting name for a group whose focus is both in reinforcement learning and in law/governance research. trialserrors.ai
Peter Henderson tweet media
Peter Henderson@PeterHndrsn

This is a challenging legal problem for NeurIPS (and other conference participants)! You might be wondering how this is possible given the First Amendment? I wrote a quick explainer on the current status quo of relevant First Amendment cases & law to get you up to speed. 🔗👇

English
0
2
13
1.3K
dan bateyko
dan bateyko@dbateyko·
Wild Anna’s Archive bounty, reads like a heist. They want someone to front tens of thousands of dollars to buy Library of Congress files for a 3k bounty. I imagine a leak investigation would have a very short suspect list
dan bateyko tweet media
English
0
0
1
144
jasmine sun
jasmine sun@jasminewsun·
Personal news: I’m joining @TheAtlantic as a contributing writer! It drives me nuts how wide of an understanding gap there is between SF AI world and everywhere else — especially given the immense public stakes. There's so much AI hype, anxiety, and misinformation; so doing translation and synthesis feels more important than ever. (This role is in addition to Subst*ck, where I’ll keep writing at the same cadence.) I'm using this excuse to share some rambly media thoughts: namely that tech journalism can & must be great again. The problem with “old media” is that it often refuses to take tech bros at their word, and the problem with “new media” is that it’s often just advertising, which is boring even for the subjects. There’s a doom loop where some reporters write poorly-informed stories, so insiders won’t talk to them, so sourcing is worse; not to mention that most journalists are not based in the communities they cover. This makes people bad-faith, but it also means a lot of AI reporting is 6-12 months behind. Yes, fantastic blogs/podcasts abound — these are the bulk of my info diet — but they are largely insiders talking to insiders, too niche to recommend to policymakers or smart non-AI friends. These fractures are a disaster for shared public knowledge, and make us less prepared to navigate AI well. Magazine writing offers the ability to rise above of the hourly play-by-play (squinting at every new model release, every new jobs report) and to the bigger questions. I actually think the most impactful AI writing has *months*, not days of longevity! Rather than over-anchoring to any particular forecast, it offers generalized frames for operating under uncertainty. A few types of pieces I’m especially keen to write: 1) AI culture: A few people’s idiosyncratic personal beliefs regularly change the world. It thus matters tremendously how AI builders view their work, politics, philosophy, and the future. I think most individuals in the AI industry are good and want their tech to do good. Journalists can portray AI workers’ earnest beliefs while being appropriately skeptical of how that can clash with or be shaped by industry incentives, and how it might diverge from the public. "Smart people confront hard moral/intellectual problem" is one of my favorite genres. 2) AI diffusion: AI discourse disproportionately focuses on its impact on software and writing because those are the jobs the messengers do (obviously I’m guilty of this). That makes me want to do more field reporting on AI in education, manufacturing, healthcare, etc: e.g. can I ride along with a team trying to integrate AI tutors into a school? Diffusion is rarely as smooth as economic models predict, and “how AI will go” depends largely on the speed, and where it hits first. Relatedly: AI in the non-western world. 3) AI superusers: Polls show people are highly anxious about AI’s speculative effects but sanguine about their personal use. I think more people should experiment with AI to feel both the pace of progress *and* its jagged edges. While AI can produce slop/surveillance/etc, it can also extend human ability & creativity. I want to paint portraits of people already “living in the future" so we can ask: is that a life we want? The tech is here, but we can choose how to relate to it. If you have ideas/feedback/etc my DMs are open, and my Signal is jws.27. For me 1-1 conversations are *not* on the record unless we say so. (I always thought this was a weird norm, and in general am happy to answer people's questions about “how journalism works” from my POV because it can be quite opaque.) (also I'm replacing my blurry macbook selfie with a b&w portrait profile picture to signify reluctant induction into the label of "capital-j Journalist.” I spent most of last year pretending to be funemployed, but I suppose this is graduation. end of an era!)
English
149
49
1.4K
131.4K
dan bateyko
dan bateyko@dbateyko·
@GaoShanghua Nice! I’ve already started to play around with it using open models, congrats on this work
English
0
0
0
12
Shanghua Gao
Shanghua Gao@GaoShanghua·
@dbateyko That’s a good question! We don’t see a clear pattern of what kinds of questions are not covered. And we do observe that with more rounds of expansion in our method or using better base LLMs, the coverage is getting better.
English
1
0
0
24
Shanghua Gao
Shanghua Gao@GaoShanghua·
Are we even measuring the right things when we evaluate LLMs? We introduce QWorld, a framework where every question generates its own evaluation world through recursive expansion tree. One question becomes 45+ fine-grained criteria. On HealthBench alone: 200k+ criteria across 530+ dimensions. 79% of QWorld's criteria are entirely novel. No expert had ever written them down, yet human judges validate they matter. It surfaces blind spots in every frontier model: sustainability, equity, emergency recognition. Dimensions standard benchmarks don't even have. Built with @YuchangSu456733, @sui67713, @CurtGinder, and @marinkazitnik Paper: arxiv.org/abs/2603.23522 Code: github.com/mims-harvard/q… Demo: qworld.openscientist.ai @Harvard @HarvardDBMI @KempnerInst @harvardmed
English
1
2
17
5.7K
dan bateyko รีทวีตแล้ว
Kenny Peng
Kenny Peng@kennylpeng·
Excited to share our new research demo. We trained an SAE on 28M social media posts to generate 20K browsable trails, spanning “analysis of fictional tropes” to “rotisserie chicken” to “zoning and land use policy.” Try it out! skytrails.org
English
2
3
7
450
dan bateyko รีทวีตแล้ว
Neel Guha
Neel Guha@NeelGuha·
I wrote a blogpost about writing machine learning research papers (e.g., NeurIPS, ICML, ICLR, etc.). The core idea is that most papers follow one of a predetermined set of templates. The post talks about each template, describes their rules, and offers examples...
Neel Guha tweet media
English
7
83
622
79.3K
dan bateyko รีทวีตแล้ว
Institute for Law & AI (LawAI)
We’re excited to announce the second annual Summer Institute on Law and AI — a gathering of law students, professionals, and academics eager to explore pressing issues at the intersection of AI, law, and policy.
English
2
1
12
954
dan bateyko รีทวีตแล้ว
Natasha Jaques
Natasha Jaques@natashajaques·
Why am I obsessed with this? LLMs do not preserve our intentions or diversity of thought in writing, and they’re already being adopted en masse. More than 1 billion people worldwide use them on a weekly basis. Existing work has shown that for individual scientists, using LLMs to generate papers increases your productivity and impact, even though it constricts science’s overall focus. In our study we show that even though participants who rely on LLMs say their writing is significantly less creative and not in their voice, they are paradoxically equally satisfied with the output. So, the adoption of LLMs is not going to slow any time soon. But it’s already affecting our cultural institutions and the way we conduct science. We urgently need more research into how massive, widespread LLM adoption will affect our science, politics, and culture.
Natasha Jaques tweet media
English
6
20
135
17K
dan bateyko รีทวีตแล้ว
maxwell neely-cohen
maxwell neely-cohen@maxnc·
Issue Five of The HTML Review is here! Get to your browser for 15 excellent new pieces of web-based literature and art
English
30
316
2.5K
278.7K