Genome Toolbox: liftOver

Showing posts with label liftOver. Show all posts

Friday, November 22, 2013

Install liftOver Locally on UNIX

Many of the UCSC genome tools are available for download for use locally on your UNIX system. liftOver is an example of one such tool. To download, go to their apps download page, select your operating system, and then click on the liftOver link. Here are links for liftOver_linux.x86_64 and liftOver_linux.x86_64.v287. For some reason, I could only get the linux.x86_64.v287 version to work on my system. Once downloaded, make it executable.

The next step is to get the required chain files needed to convert from one genome build to another. All UCSC genome builds are listed here and you can select the desired "LiftOver files" link under the genome build that you want to convert from. This will take you to a downloads page with links to the chain files. Chain files are appropriately named so you know what builds you are converting to and from. For example, hg19ToHg18.over.chain.gz is the chain file needed to convert from hg19 to hg18. Once downloaded, unzip the file for use.

To run liftOver, the useage is:
liftOver oldFile map.chain newFile unMapped

where:
oldFile is the file you want to convert from
map.chain is the chain file used to convert from one build to another
newFile is the converted file you want to create
unMapped is a file that contains all the unmapped positions

For more details on usage, just type liftOver in the command line.

Tuesday, July 16, 2013

How to get Phased Imputed Genotypes for Haplotype Analysis

I wanted to compare haplotypes from a set of cases genotyped on one platform to a set of reference controls genotyped on another platform for a small chromosomal region. I looked into a few methods to do this and found that phasing the genotypes using SHAPEIT and then imputing off the 1000 Genomes reference panel using IMPUTE2 was the optimal approach to have a set of overlapping genotypes for a haplotype analysis. Phasing from the .ped and .map files was easy using SHAPEIT. The command I used with my .ped and .map files was:

Other file types can also be used as direct input to SHAPEIT (ex: .bed/.bim/.fam and .gen/.sample). Genetic recombination maps can be downloaded here for build36 and build37. Also the backslashes (\) are not necessary. They just help organize the code by telling the computer to keep reading the input on the next line.

Once you have your .haps and .sample file generated, you are then ready to impute using IMPUTE2. I used the command below:

Again the backslashes are just for a cleaner visualization of the code. It is important when using the 1000 Genomes as your reference panel to have the SNP coordinates in hg19 coordinates. LiftOver can help you convert from one build to another. If you need the 1000 Genomes reference panel, it can be downloaded here. The genetic map file is from the above SHAPEIT download. Finally, if you want phased imputed results, the -phase command is essential to include.

You should now have your phased imputed genotypes to do whatever you wish with. I chose to convert the phased imputed genotypes into a .fasta file with entries for each haplotype (see Python script). Then I created parsimony trees with MEGA and visualized them in HapView.

Thursday, May 23, 2013

Download One Thousand Genome Data for Haploview

Haploview has a built-in portal to download HapMap data, but Haploview development hasn't kept pace with developing a way to automatically download 1000G SNP data. Searching for a way to visualize the higher density SNP coverage of the 1000G project, I found it was not all too difficult to do this manually. It involves a couple of extra steps.

First, determine the genomic coordinates of region you are interested in. This needs to be in hg19 coordinates. If you have hg18 coordinates, liftOver is a useful tool to convert coordinates from one human genome build to another (liftOver format is chr:start-end, for example: chr8:1000-50000).

Next, go to this 1000G website, and plug in your genomic coordinates of interest. Here the coordinates should not include chr in the chromosome name (for example: 8:1000-50000). Then on the next page select ancestral populations you are interested in (you can select multiple populations by holding down Ctrl). Give the website a few seconds to generate the files. Eventually a link to a marker information file (.info) and linkage pedigree file (.ped) will appear. Right click on each of these files and save them to your computer.

Now, fire up Haploview and select Open new data. Go to the Linkage Format tab and browse for your .ped file in the Data File field and your .info file in the Locus Information File field (the .info file field is usually automatically generated after selecting your .ped file if your .ped and .info files have the same prefix). Haploview will load the files and you should be ready to visualize the LD structure. Enjoy using 1000G data in Haploview!

Pages