tokenizer = AutoTokenizer.from_pretrained("adibvafa/CodonTransformer") model = BigBirdForMaskedLM.from_pretrained("adibvafa/CodonTransformer").to(device)
protein = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG" organism = "Escherichia coli general"
output = predict_dna_sequence( protein=protein, organism=organism, device=device, tokenizer=tokenizer, model=model, attention_type="original_full", deterministic=True ) print(format_model_output(output))
The output is:
<br>
```python
-----------------------------
| Organism |
-----------------------------
Escherichia coli general
-----------------------------
| Input Protein |
-----------------------------
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG
-----------------------------
| Processed Input |
-----------------------------
M_UNK A_UNK L_UNK W_UNK M_UNK R_UNK L_UNK L_UNK P_UNK L_UNK L_UNK A_UNK L_UNK L_UNK A_UNK L_UNK W_UNK G_UNK P_UNK D_UNK P_UNK A_UNK A_UNK A_UNK F_UNK V_UNK N_UNK Q_UNK H_UNK L_UNK C_UNK G_UNK S_UNK H_UNK L_UNK V_UNK E_UNK A_UNK L_UNK Y_UNK L_UNK V_UNK C_UNK G_UNK E_UNK R_UNK G_UNK F_UNK F_UNK Y_UNK T_UNK P_UNK K_UNK T_UNK R_UNK R_UNK E_UNK A_UNK E_UNK D_UNK L_UNK Q_UNK V_UNK G_UNK Q_UNK V_UNK E_UNK L_UNK G_UNK G_UNK __UNK
-----------------------------
| Predicted DNA |
-----------------------------
ATGGCTTTATGGATGCGTCTGCTGCCGCTGCTGGCGCTGCTGGCGCTGTGGGGCCCGGACCCGGCGGCGGCGTTTGTGAATCAGCACCTGTGCGGCAGCCACCTGGTGGAAGCGCTGTATCTGGTGTGCGGTGAGCGCGGCTTCTTCTACACGCCCAAAACCCGCCGCGAAGCGGAAGATCTGCAGGTGGGCCAGGTGGAGCTGGGCGGCTAA
Project Website
https://adibvafa.github.io/CodonTransformer/
GitHub Repository
https://github.com/Adibvafa/CodonTransformer
Google Colab Demo
https://adibvafa.github.io/CodonTransformer/GoogleColab
PyPI Package
https://pypi.org/project/CodonTransformer/
@article{Fallahpour_Gureghian_Filion_Lindner_Pandi_2025,
title={CodonTransformer: a multispecies codon optimizer using context-aware neural networks},
volume={16},
ISSN={2041-1723},
url={https://www.nature.com/articles/s41467-025-58588-7},
DOI={10.1038/s41467-025-58588-7},
number={1},
journal={Nature Communications},
author={Fallahpour, Adibvafa and Gureghian, Vincent and Filion, Guillaume J. and Lindner, Ariel B. and Pandi, Amir},
year={2025},
month=apr,
pages={3205},
language={en}
}