
So representation matters: amino-acid tokens lose chemistry that SMILES/chemical language models can retain.
Thus: smaller domain models will work, but only if the input format preserves the molecular information the assay actually sees.
More: biorxiv.org/content/10.648…
English