Publications

  • May 2026
    Detecting and quantifying overparametrization in RNA language models with REDIAL
    bioRxiv

    RNA foundation models are increasingly used for structure prediction and design, but downstream benchmarks can blur genuine biological learning with task-specific memorization. REDIAL addresses this with a zero-shot, unsupervised diagnostic that extracts coevolutionary signals directly from RNA language model embeddings. By probing models layer by layer, it reveals what they have internalized about RNA structure, showing that current RNA LMs are often overparameterized for available sequence diversity while structure-guided pretraining improves learned base-pair coupling signals.