Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.

Tuesday, May 12, 2015

Convert Protein Codon to Genome Coordinates

I needed a way to check if a few codons from different proteins were covered in a next-generation sequencing panel. This sounds relatively easy to do, but proves to be a bit difficult. Here are steps to do this.

(1) Use Biomart ID converter to find the Ensembl protein ID for your protein of interest.
(2) Use Ensembl GET map/translation/:id/:region to find the genomic coordinates of the codon of interest using the following script:


ENSP00000288602 is the Ensembl protein ID for your protein of interest (example: BRAF gene)
100..100 are the start and stop codons (example: just codon 100)

The result is a JSON formatted string like this:
{"mappings":[{"assembly_name":"GRCh38","end":140834815,"seq_region_name":"7","gap":0,"strand":-1,"coord_system":"chromosome","rank":0,"start":140834813}]}

This indicates that codon 100 of the BRAF gene (for this protein transcript) is located at chr7:140834813-140834815. Ensembl uses GRCh38. If you need other builds of the genome, use liftOver for converting.

I am sure there are probably more automated ways out there to do this, but this worked for the small subset of codons I needed to check in the design panel. If you have a better way to do this, please share in the comments section.

No comments:

Post a Comment