Justin Baeder, PhD

45.9K posts

Justin Baeder, PhD banner
Justin Baeder, PhD

Justin Baeder, PhD

@eduleadership

Education philosopher & instructional leadership author. Creator of Repertoire, the professional writing app for instructional leaders.

Heber Springs, AR Katılım Mart 2009
11.8K Takip Edilen26.5K Takipçiler
Sabitlenmiş Tweet
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
I’m thrilled to announce that my new book with Keith Fickel is available for pre-order! Cultivate & Activate: Building Teacher Capacity for Instructional Leadership Ships May 2026 a.co/d/09pc4IKs
Justin Baeder, PhD tweet media
English
1
2
15
2.4K
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
@griswold Who do you think is teaching these dual credit classes? It’s either adjuncts or high school teachers. It’s not tenure-track faculty.
English
1
0
0
36
Matt Griswold
Matt Griswold@griswold·
Yes, that's surely correct in *some* cases... and those colleges should (care to) solve it. But what is the goal of a subject-specific proxy vs. dual credit for college courses? According to the College Board, approximately one-third of public high school graduates took an AP test in 2024 and two-thirds of them scored 3+. It's likely that those ~22% of high school grads *would* get an A on an introductory college course. If instead 50% of the college class gets an A, the AP students are getting the same diluted credit from that school... it's an issue for the college. If the goal is to be another filter for the admissions department, wouldn't adding more headroom to the SAT provide more signal than a high school musical version of a college course? Also, we tend to think of AP as rigor but there are 40 AP subjects now including AP Drawing, AP Art History, and AP Precalculus! "Show me the incentives" strikes again. If you take these six months after high school graduation, on the other side, nobody would call it a rigorous schedule for a college freshman. It is possible that *some* high schools could have a better teacher for AP Microeconomics, but probably not in most cases. AP Physics? Probably not better than the physics department. AP Computer Science? Probably not. The only AP subjects that I'd believe the average high school could have a genuine systemic advantage to teach is history. "Some high schools are better than some colleges" is not a great justification for the AP franchise, even if it's true.
English
1
0
0
22
Matt Griswold
Matt Griswold@griswold·
Hot take: eliminate AP classes and credits entirely. Instead, teens that want college credit should have free access to dual enrollment programs under their state university system to earn college credit via college classes. AP test incentives are misaligned in the worst way: "Students and families are happier because they get college credit. . . . Schools are happier because they look good. Governors and state agencies are happier because they get to brag about it.” Dual enrollment does not resolve every incentive problem, but it at least eliminates one layer of abstraction. It also can help reduce the cost of college by shifting some general requirements into the academic stagnation that is high school. This could be the simplest way to pivot American high schools toward tracking, too, so long as students not looking at college are also able to access courses that lead toward interests and industry certification. To anyone who defends AP classes are the last leg of meritocracy in schools, I'd argue it's more meritocratic to not gate keep the real thing just because someone is slightly younger than a college freshman. The ceiling can be much higher. It is possible, as the College Board suggests, that "AP standards for qualifying scores remain more stringent than grading standards in many college classrooms." Sure; but that's an issue for the state colleges to resolve. College students regularly transfer in with community college credits for these introductory courses, so the colleges have already determined that AP tests are unnecessary. AP tests are the signature of a decades-long evolution of high schools choosing college prep over life prep, so it's not a unqualified scapegoat to begin moving in another direction. I welcome any steel man against this idea.
Matt Griswold tweet media
English
16
2
31
2.3K
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
Andrey Kurenkov@andrey_kurenkov

This research is basically clickbait... These 'esoteric' languages (Brainfuck, Befunge-98, Whitespace, Unlambda, and Shakespear) in the benchmark are not just ones with less training data online, they are also just **much harder** and **less efficient** to do anything productive with, and failing to even discuss this is crazy. Saying that if you can solve something in python you should be able to generalize to these languages is akin to saying that you should be able to generalize from tasks in python to assembly. It's obviously not the same difficulty level to do tasks in python vs assembly. So is low scores on the benchmark due to lacking "ability to generalize computational reasoning to novel domains", or due to the increased difficulty of the task due to the language of choice? Somehow this question is not addressed in the paper not noted in the limitations, as far as I could find. For reference, here are the languages (info from wikipedia): * Brainfuck: The language only consists of 8 operators, yet with the 8 operators, <>+-[]. Here's 'hello world': >++++++++[<+++++++++>-]<.>++++[<+++++++>-]<+.+++++++..+++.>>++++++[<+++++++>-]<+ +.------------.>++++++[<+++++++++>-]<+.<.+++.------.--------.>>>++++[<++++++++>- ]<+. * Whitespace: 'only whitespace characters (space, tab and newline) have meaning – contrasting typical languages that largely ignore whitespace characters.' See first attached image for 'hello world' code. * Befunge-98: a stack-based, reflective language in which programs are arranged on a two-dimensional grid. "Arrow" instructions direct the control flow to the left, right, up or down, and loops are constructed by sending the control flow in a cycle. Hello world: >25*"!dlroW olleH":v v:,_@ > ^ * Unlambda: 'a minimal functional programming based on combinatory logic, an expression system without the lambda operator or free variables. It relies mainly on two built-in functions (s and k) and an apply operator (written `, the backquote character).' `r```````````.H.e.l.l.o. .w.o.r.l.di * Shakespear: 'A character list in the beginning of the program declares a number of stacks, naturally with names like "Romeo" and "Juliet". These characters enter into dialogue with each other in which they manipulate each other's topmost values, push and pop each other, and do I/O. The characters can also ask each other questions which behave as conditional statements. On the whole, the programming model is very similar to assembly language but much more verbose.' See second image for just part of the hello world. I don't want to be mean to the researchers, I do like the idea behind the research, but the way it's presented feels so misleading to me that I can't help but feel the entire effort is either in bad faith or very poorly thought out.

English
0
0
0
69
Justin Baeder, PhD retweetledi
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
@fchollet @lymanstoneky I’m unfamiliar with the AI research in this area, but K-12 wasted 15 years only to rediscover that there is no higher-level generalizable knowledge. Facts are what we think with. Curious what the current thinking in AI is.
English
1
1
2
564
François Chollet
François Chollet@fchollet·
This is more evidence that current frontier models remain completely reliant on content-level memorization, as opposed to higher-level generalizable knowledge (such as metalearning knowledge, problem-solving strategies...)
Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English
174
303
2.9K
254.9K
Justin Baeder, PhD retweetledi
Tom Loveless
Tom Loveless@tomloveless99·
Double-dose Algebra I is one of the few interventions at the high school level to show real promise. Positive effects last into college. But students must be grouped by math skills. When double-dose students near the national median were placed in classes with much lower achieving peers, the positive outcomes disappeared. pnas.org/doi/epdf/10.10…
English
7
19
62
7.8K
Matt Griswold
Matt Griswold@griswold·
I'm proposing students take the same college class, to be clear... not a proxy. It's the same credit, so any inflation will be the same whether you're a high school senior or a college freshman a few months later. I see two potential backstops to grade inflation in college, since employers don't trust the grades anyway. 1. transparency - list the class average on the transcript to devalue inflated grades 2. transferability - if a college's credits aren't broadly accepted because of perception, then there is at least some market pressure to meet the standards of transferability. Ultimately, the accreditation cartel needs to step up; but they have perverse incentives, too! All of education is just self-dealing from one level to the next (college grades are for grad schools). Industry demand is the only force properly motivated to restore rigor, but it will take a decade to couple industry and education more tightly. It won't fix college-level grade inflation to allow teens to acquire transferable college credits early; but it obviates the need for a $500M/yr proxy franchise and could benefit students beyond the top ranks.
English
1
0
0
31
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
@griswold Like, what’s the backstop to prevent college classes from being dumbed down even more? There’s ~no standardized testing at all to compare with grades.
English
2
0
1
78
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
@griswold I’m sad to see AP standards declining, but it seems like that could be fixed with the stroke of a pen. Grade inflation is a much harder problem, and it affects college classes as well. I don’t see how switching from a national exam to college grades is good for rigor.
English
2
0
2
113
3rdMoment 🏛️
3rdMoment 🏛️@3rdMoment·
@eduleadership The measurement error is not what matters for evaluating the size/importance of the average treatment effect. You really don't seem to get this.
English
1
0
0
21
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
@3rdMoment Do you think the measurement error of this test is zero questions? Because if it’s not, it’s absolutely fair to call an effect equivalent to roughly one exam question “noise.”
English
1
0
0
30
3rdMoment 🏛️
3rdMoment 🏛️@3rdMoment·
@eduleadership Whether you call an effect of this size "large" or "small" is a matter of judgement, but comparing it to "noise" is not a valid way to do this.
English
1
0
0
21
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
@CreditFlex @PSkinnerTech It deserves to be lumped together, in the absence of any evidence that a particular product is different in some meaningful way. Every developer seems to think they’ve unlocked the secret, but nobody has any notable results.
English
0
0
0
42
Teacher EcoSystems Matter
@eduleadership @PSkinnerTech Honestly, Patrick—and I often agree with Justin on some things—having a discussion about the efficacy of “edtech”, like that was an actual thing, seems a waste of time. “Edtech” has become a bugaboo for some people in the way “religion” has been. Ignore anyone using the word.
English
1
0
0
36
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
The problem is that a lot of smart people are working on a solution category that has been proven not to work. I’m not saying it can never work, but it would be truly extraordinary, and people don’t seem to have any particularly extraordinary ideas.
Patrick Skinner - edu/acc@PSkinnerTech

@eduleadership So, you think that instead of becoming part of the solution, because you have no solution, you'd rather hate on the others who are trying to develop a solution. Because there's clearly a problem, and many are trying to develop a solution... except you.

English
2
0
4
1K
Thucides
Thucides@Thucides·
@eduleadership @PSkinnerTech The MET study asserts that teachers have a direct impact on student learning. Either you're misreading it, or being purposefully obtuse
English
1
0
0
11
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
@Thucides @PSkinnerTech I'm not cynical about education at all. I'm just well-informed about the various fads that constantly attempt to disrupt the profession with bad ideas that have already failed. Also, you:
Justin Baeder, PhD tweet media
English
0
0
0
17
Thucides
Thucides@Thucides·
@eduleadership @PSkinnerTech And your persona of being an edu cynic-nihilist doesn't make you sound sage, it's just a professional obtuseness act with a hint of appeal to authority that lets you claim intellectual primacy without delivering anything original or insightful
English
1
0
0
14
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
@Thucides @PSkinnerTech I mean assignment to specific teachers, not at the school level. MET studied exactly what you are talking about to address this precise issue of teacher impact, and found only that it's not a stable construct.
English
1
0
0
28
Thucides
Thucides@Thucides·
@eduleadership @PSkinnerTech Student assignments in many schools do not change year over year, and the data is glaring to anyone looking at it. You keep mentioning MET, and it's unrelated. They couldn't find scalable identifiers, that doesn't mean particular teachers don't move the needle tremendously on SL
English
2
0
0
33
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
@3rdMoment This sample was 770 students. That’s plenty. I’m sure this experiment could be repeated across more tests to see if the effect is bigger or smaller. But we would probably just get further confirmation that the effect is very small.
English
1
0
0
30
3rdMoment 🏛️
3rdMoment 🏛️@3rdMoment·
@eduleadership That doesn't matter! You are very confused. The reason we want a large sample is to average out the noise and find the signal. Lots of noise means we need a bigger sample, but it doesn't make the signal any more or less important.
English
1
0
0
26
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
@Thucides @PSkinnerTech My ancient brother, the brightest minds in our field spent three years and $45 million on MET trying to find a signal in this noise, and all they found was that it’s all noise. Student assignment is extremely far from random—that’s the primary source of noise.
English
1
0
0
34
Thucides
Thucides@Thucides·
@eduleadership @PSkinnerTech MET project is a non- sequitor. If teacher A has 80% proficiency with a cohort of 28 students, and that same cohort has 50% proficiency the year prior and 60% the subsequent year, that isn't just "noise." And this happens in nearly every school.
English
1
0
0
27
3rdMoment 🏛️
3rdMoment 🏛️@3rdMoment·
@eduleadership Improving by an average of one question, for every student? From an intervention that takes a little or no resources? You think that wouldn't matter? (I am not defending the 6-9 months claim.)
English
1
0
0
28
Justin Baeder, PhD
Justin Baeder, PhD@eduleadership·
@Thucides @PSkinnerTech Circular reasoning. If you define the best teachers as the ones with the best test scores, then this will appear to work. Gates’ MET project tried to measure this, and got only noise.
English
1
0
0
22
Thucides
Thucides@Thucides·
@eduleadership @PSkinnerTech I think you're right about ed tech, but a year in your school's best 2nd grade teacher versus the worst is more than 10% of variance in regards to benchmark assessments
English
1
0
0
14