Publications
- May 2026Detecting and quantifying overparametrization in RNA language models with REDIALbioRxiv
RNA foundation models are increasingly used for structure prediction and design, but downstream benchmarks can blur genuine biological learning with task-specific memorization. REDIAL addresses this with a zero-shot, unsupervised diagnostic that extracts coevolutionary signals directly from RNA language model embeddings. By probing models layer by layer, it reveals what they have internalized about RNA structure, showing that current RNA LMs are often overparameterized for available sequence diversity while structure-guided pretraining improves learned base-pair coupling signals.