
@AdamRodmanMD @GStetsonMD @FutureDocs @JenniferSpicer4 @cjchiu @drjfrank @LaraVarpio @ETSshow @WrayCharles @mededdoc @ShreyaTrivediMD Re: validity - it's interesting because in the paper they counted entities using NLP (not humans!). Ofc the target was human density preferences, but GPT-4 also rated the summaries as well. As we see more RLAIF, I'll be interested to see what kind of validity measures come out

English

