Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.

Tuesday, July 16, 2013

How to get Phased Imputed Genotypes for Haplotype Analysis

I wanted to compare haplotypes from a set of cases genotyped on one platform to a set of reference controls genotyped on another platform for a small chromosomal region.  I looked into a few methods to do this and found that phasing the genotypes using SHAPEIT and then imputing off the 1000 Genomes reference panel using IMPUTE2 was the optimal approach to have a set of overlapping genotypes for a haplotype analysis.  Phasing from the .ped and .map files was easy using SHAPEIT.  The command I used with my .ped and .map files was:

Other file types can also be used as direct input to SHAPEIT (ex: .bed/.bim/.fam and .gen/.sample).  Genetic recombination maps can be downloaded here for build36 and build37.  Also the backslashes (\) are not necessary.  They just help organize the code by telling the computer to keep reading the input on the next line.

Once you have your .haps and .sample file generated, you are then ready to impute using IMPUTE2.  I used the command below:

Again the backslashes are just for a cleaner visualization of the code.  It is important when using the 1000 Genomes as your reference panel to have the SNP coordinates in hg19 coordinates.  LiftOver can help you convert from one build to another.  If you need the 1000 Genomes reference panel, it can be downloaded here.  The genetic map file is from the above SHAPEIT download.  Finally, if you want phased imputed results, the -phase command is essential to include.

You should now have your phased imputed genotypes to do whatever you wish with.  I chose to convert the phased imputed genotypes into a .fasta file with entries for each haplotype (see Python script).  Then I created parsimony trees with MEGA and visualized them in HapView.

No comments:

Post a Comment