(1) Use Biomart ID converter to find the Ensembl protein ID for your protein of interest.
(2) Use Ensembl GET map/translation/:id/:region to find the genomic coordinates of the codon of interest using the following script:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
wget -q --header='Content-type:application/json' 'http://rest.ensembl.org/map/translation/ENSP00000288602/100..100?' -O - |
ENSP00000288602 is the Ensembl protein ID for your protein of interest (example: BRAF gene)
100..100 are the start and stop codons (example: just codon 100)
The result is a JSON formatted string like this:
{"mappings":[{"assembly_name":"GRCh38","end":140834815,"seq_region_name":"7","gap":0,"strand":-1,"coord_system":"chromosome","rank":0,"start":140834813}]}
This indicates that codon 100 of the BRAF gene (for this protein transcript) is located at chr7:140834813-140834815. Ensembl uses GRCh38. If you need other builds of the genome, use liftOver for converting.
I am sure there are probably more automated ways out there to do this, but this worked for the small subset of codons I needed to check in the design panel. If you have a better way to do this, please share in the comments section.
No comments:
Post a Comment