Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.

Thursday, May 23, 2013

Download One Thousand Genome Data for Haploview

Haploview has a built-in portal to download HapMap data, but Haploview development hasn't kept pace with developing a way to automatically download 1000G SNP data.  Searching for a way to visualize the higher density SNP coverage of the 1000G project, I found it was not all too difficult to do this manually.  It involves a couple of extra steps.

First, determine the genomic coordinates of region you are interested in.  This needs to be in hg19 coordinates.  If you have hg18 coordinates, liftOver is a useful tool to convert coordinates from one human genome build to another (liftOver format is chr:start-end, for example: chr8:1000-50000).

Next, go to this 1000G website, and plug in your genomic coordinates of interest.  Here the coordinates should not include chr in the chromosome name (for example: 8:1000-50000).  Then on the next page select ancestral populations you are interested in (you can select multiple populations by holding down Ctrl).  Give the website a few seconds to generate the files.  Eventually a link to a marker information file (.info) and linkage pedigree file (.ped) will appear.  Right click on each of these files and save them to your computer.

Now, fire up Haploview and select Open new data.  Go to the Linkage Format tab and browse for your .ped file in the Data File field and your .info file in the Locus Information File field (the .info file field is usually automatically generated after selecting your .ped file if your .ped and .info files have the same prefix).  Haploview will load the files and you should be ready to visualize the LD structure.  Enjoy using 1000G data in Haploview!

1 comment:

  1. Hi, I tried this but couldn't get it to work due (I think) to the presence of a > bi-allelic allele meaning the file couldn't be loaded by Haploview.

    I used the region: 12:96394531-96437298

    and the CEU population.

    Has this ever occurred to you?

    A solution might be to use plink to remove e.g. triallelic alleles, but if I'm going to do that I might as well do everything from the command line!