FishNALM-20L_prom_300_tata
FishNALM-20L_prom_300_tata is a fine-tuned version of FishNALM-20L_pretrain for Promoter prediction (300 bp, TATA promoters) in fish genomics.
Model description
This repository contains a task-specific fine-tuned checkpoint from the FishNALM model family. The model was initialized from the pretrained base model FishNALM-20L_pretrain and then fine-tuned for Promoter prediction (300 bp, TATA promoters).
Task
Task name: Promoter prediction (300 bp, TATA promoters)
Task type: binary classification
Prediction target: promoter vs non-promoter sequences for the TATA promoters subset
Examples:
- CTCF TFBS prediction
- Pou5f1 TFBS prediction
- Sox2 TFBS prediction
- histone modification prediction
- promoter prediction
- splice donor prediction
- splice acceptor prediction
- splice classification
Base model
- Base model repository:
xia-lab/FishNALM-20L_pretrain - Model family: FishNALM
- Initialization type: pretrained checkpoint + downstream fine-tuning
Training data
This model was fine-tuned on Promoter prediction (300 bp, TATA promoters) data from FishGUE.
Evaluation
- Primary metric:
MCC - Evaluation split / strategy:
predefined train/validation/test split
Intended uses
This model is intended for:
- fish genomics sequence classification
- downstream task inference on sequences similar to the fine-tuning setting
- comparative benchmarking within fish genomic prediction tasks
Limitations
- This is a task-specific fine-tuned model and should be used within the scope of
Promoter prediction (300 bp, TATA promoters). - Generalization to other species, tasks, or sequence lengths may be limited.
- This is a research model and is not intended for clinical or diagnostic use.
How to use
Load tokenizer and model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
repo_name = "xia-lab/FishNALM-20L_prom_300_tata"
tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForSequenceClassification.from_pretrained(repo_name)
Example inference
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
repo_name = "xia-lab/FishNALM-20L_prom_300_tata"
sequence = "ATGCGTACGTTAGCTAGCTAGCTAGCTAGCTA"
tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForSequenceClassification.from_pretrained(repo_name)
inputs = tokenizer(
sequence,
return_tensors="pt",
truncation=True,
padding="max_length",
max_length=512,
)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.softmax(logits, dim=-1)
prediction = torch.argmax(probabilities, dim=-1)
print("logits:", logits)
print("probabilities:", probabilities)
print("prediction:", prediction)
Label mapping
0: negative1: positive
Files in this repository
Typical files in this repository may include:
config.jsonmodel.safetensorstokenizer.jsontokenizer_config.jsonspecial_tokens_map.jsonvocab.txtREADME.md
Citation
If you use this model, please cite the FishNALM manuscript.
Contact
For questions, please contact: xqxia@ihb.ac.cn
- Downloads last month
- 13