

Justin Baeder, PhD
45.9K posts

@eduleadership
Education philosopher & instructional leadership author. Creator of Repertoire, the professional writing app for instructional leaders.







Rather than attending to the hysterical panic of some educators, I'm paying close attention to USC research showing two-thirds of teens use AI for homework even in schools that ban it. Most parents don't know their school's AI rules, and only 9% realize how much their kids use it. Students mainly use AI to search information and brainstorm. Both teens and parents worry about AI's harms, though frequent users are more optimistic. Bans clearly don't work. open.substack.com/pub/psychoftec…

This research is basically clickbait... These 'esoteric' languages (Brainfuck, Befunge-98, Whitespace, Unlambda, and Shakespear) in the benchmark are not just ones with less training data online, they are also just **much harder** and **less efficient** to do anything productive with, and failing to even discuss this is crazy. Saying that if you can solve something in python you should be able to generalize to these languages is akin to saying that you should be able to generalize from tasks in python to assembly. It's obviously not the same difficulty level to do tasks in python vs assembly. So is low scores on the benchmark due to lacking "ability to generalize computational reasoning to novel domains", or due to the increased difficulty of the task due to the language of choice? Somehow this question is not addressed in the paper not noted in the limitations, as far as I could find. For reference, here are the languages (info from wikipedia): * Brainfuck: The language only consists of 8 operators, yet with the 8 operators, <>+-[]. Here's 'hello world': >++++++++[<+++++++++>-]<.>++++[<+++++++>-]<+.+++++++..+++.>>++++++[<+++++++>-]<+ +.------------.>++++++[<+++++++++>-]<+.<.+++.------.--------.>>>++++[<++++++++>- ]<+. * Whitespace: 'only whitespace characters (space, tab and newline) have meaning – contrasting typical languages that largely ignore whitespace characters.' See first attached image for 'hello world' code. * Befunge-98: a stack-based, reflective language in which programs are arranged on a two-dimensional grid. "Arrow" instructions direct the control flow to the left, right, up or down, and loops are constructed by sending the control flow in a cycle. Hello world: >25*"!dlroW olleH":v v:,_@ > ^ * Unlambda: 'a minimal functional programming based on combinatory logic, an expression system without the lambda operator or free variables. It relies mainly on two built-in functions (s and k) and an apply operator (written `, the backquote character).' `r```````````.H.e.l.l.o. .w.o.r.l.di * Shakespear: 'A character list in the beginning of the program declares a number of stacks, naturally with names like "Romeo" and "Juliet". These characters enter into dialogue with each other in which they manipulate each other's topmost values, push and pop each other, and do I/O. The characters can also ask each other questions which behave as conditional statements. On the whole, the programming model is very similar to assembly language but much more verbose.' See second image for just part of the hello world. I don't want to be mean to the researchers, I do like the idea behind the research, but the way it's presented feels so misleading to me that I can't help but feel the entire effort is either in bad faith or very poorly thought out.


🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵


APs are getting easier, but they are still highly meritocratic compared to not just high school grades but college grades. Most students who get an A in a college class would be lucky to get a 3 on the corresponding AP.





0.15 SD on a course final exam means nothing. Noise.





“Culturally responsive mathematics teaching interrogates and innovates mathematics instruction to be a transformative and humanizing experience for everyone.” #NCTMNOLA26 is kicked off with Julia Aguirre and Maria Zavala as the Opening Session speakers!



@eduleadership So, you think that instead of becoming part of the solution, because you have no solution, you'd rather hate on the others who are trying to develop a solution. Because there's clearly a problem, and many are trying to develop a solution... except you.


AI really can help education: Randomized controlled experiment on high school students found a GPT-4o powered tutor that personalized problems for students raised final test scores by .15 SD, "equivalent to as much as six to nine months of additional schooling by some estimates"












