Two questions about RNA secondary structure prediction task

by chiyazzz - opened Dec 19, 2024

Dec 19, 2024

First thanks for the very excellent work!
I have a question about RNA secondary structure prediction task. There is an inconsistency between performance in Supplementary Table 1 and in paper context. For example, PlantRNA-FM performance on ArchiveII in Supplementary Table 1 is 0.855, but in paper is 0.924. The data in Source Data Extended Data Fig. 1 is also in inconsistent with Supplementary Table 1 and in paper context. Which one is correct?
Another question is that, what does PlantRNA-FM-RNA-Only mean? I cannot find an explanation in the paper.

chiyazzz

Dec 19, 2024

also, it seems that the model in this huggingface repository does not have the mlm, secondary structure, and annotation prediction head.

yangheng

Owner Dec 21, 2024

Both of the scores are valid. The lower score means we removed similar structures. We only released the MLM model as shown in the example.

yangheng

Owner Dec 21, 2024

from transformers import AutoModelForMaskedLM, AutoTokenizer

model_name_or_path = "yangheng/PlantRNA-FM"

model = AutoModelForMaskedLM.from_pretrained(model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

chiyazzz

Dec 22, 2024

Thanks! And what does PlantRNA-FM-RNA-Only mean?

yangheng

Owner Dec 22, 2024

It refers to the model without structure pretraining

yangheng changed discussion status to closed Dec 22, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment