Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Showing posts with label haplotype. Show all posts
Showing posts with label haplotype. Show all posts

Tuesday, July 16, 2013

Create FASTA sequences for Phased Haplotypes

Here is some Python code I put together to convert a .haps file (and associated .sample file) into a .fasta file with an entry for each haploytpe sequence.  Haplotypes are designated >ID_A and >ID_B for each ID in the .sample file.   The program can easily be modified to accept a list of SNPs or IDs that you would like to extract from the .haps file.  Also, this program removes indels that may be present in the .haps file to avoid alignment issues.  This program was useful to feed haplotype input into phylogenetic tree programs, such as MEGA.  Just run the program by typing python make_fasta.py data.haps in at the commnad prompt and you will get a data.fasta file as output.  Hope it is useful.

How to get Phased Imputed Genotypes for Haplotype Analysis

I wanted to compare haplotypes from a set of cases genotyped on one platform to a set of reference controls genotyped on another platform for a small chromosomal region.  I looked into a few methods to do this and found that phasing the genotypes using SHAPEIT and then imputing off the 1000 Genomes reference panel using IMPUTE2 was the optimal approach to have a set of overlapping genotypes for a haplotype analysis.  Phasing from the .ped and .map files was easy using SHAPEIT.  The command I used with my .ped and .map files was:


Other file types can also be used as direct input to SHAPEIT (ex: .bed/.bim/.fam and .gen/.sample).  Genetic recombination maps can be downloaded here for build36 and build37.  Also the backslashes (\) are not necessary.  They just help organize the code by telling the computer to keep reading the input on the next line.

Once you have your .haps and .sample file generated, you are then ready to impute using IMPUTE2.  I used the command below:


Again the backslashes are just for a cleaner visualization of the code.  It is important when using the 1000 Genomes as your reference panel to have the SNP coordinates in hg19 coordinates.  LiftOver can help you convert from one build to another.  If you need the 1000 Genomes reference panel, it can be downloaded here.  The genetic map file is from the above SHAPEIT download.  Finally, if you want phased imputed results, the -phase command is essential to include.

You should now have your phased imputed genotypes to do whatever you wish with.  I chose to convert the phased imputed genotypes into a .fasta file with entries for each haplotype (see Python script).  Then I created parsimony trees with MEGA and visualized them in HapView.

Thursday, May 23, 2013

Download One Thousand Genome Data for Haploview

Haploview has a built-in portal to download HapMap data, but Haploview development hasn't kept pace with developing a way to automatically download 1000G SNP data.  Searching for a way to visualize the higher density SNP coverage of the 1000G project, I found it was not all too difficult to do this manually.  It involves a couple of extra steps.

First, determine the genomic coordinates of region you are interested in.  This needs to be in hg19 coordinates.  If you have hg18 coordinates, liftOver is a useful tool to convert coordinates from one human genome build to another (liftOver format is chr:start-end, for example: chr8:1000-50000).

Next, go to this 1000G website, and plug in your genomic coordinates of interest.  Here the coordinates should not include chr in the chromosome name (for example: 8:1000-50000).  Then on the next page select ancestral populations you are interested in (you can select multiple populations by holding down Ctrl).  Give the website a few seconds to generate the files.  Eventually a link to a marker information file (.info) and linkage pedigree file (.ped) will appear.  Right click on each of these files and save them to your computer.

Now, fire up Haploview and select Open new data.  Go to the Linkage Format tab and browse for your .ped file in the Data File field and your .info file in the Locus Information File field (the .info file field is usually automatically generated after selecting your .ped file if your .ped and .info files have the same prefix).  Haploview will load the files and you should be ready to visualize the LD structure.  Enjoy using 1000G data in Haploview!

Tuesday, May 21, 2013

How to Make a Phylogenetic Tree

I wanted to cluster some genetic haplotypes I have into a phylogenetic structure to look for similarities between haplotypes.  I stumbled across the program MEGA.  Molecular Evolutionary Genetics Analysis (MEGA) is a relatively easy to use program that offers a wide variety of cluster building algorithms.  The input data format is relatively easy to generate based on the provided examples and the output trees are quite customizable.  Check it out for yourself.

Wednesday, May 15, 2013

Transform fastPHASE Output to Haploview Input

After successfully phasing with fastPHASE, I wanted to view the resulting haplotypes in Haploview.  Since I couldn't find any scripts online to convert the hapguess_switch.out file from fastPHASE into the .haps file that Haploview requires, I wrote a brief Python script to do the heavy lifting.


This script is run from the command line by typing python make_haploview_input.py hapguess_switch.out and will produce a hapguess_switch.haps file.  Because of the use of generators, this will only work on Python 2.6 or higher.  You will also need to create a .info file to match the .haps file.  The .info file has the marker loci in the same order as the .haps file and consists of two columns; the first being the marker name and the second being the marker position.

Install Haploview on Vista / Windows 7 / Windows 8

I wanted to use Haploview to visualize the haplotype structure around a genomic area of interest.  Thinking it would be a quick, simple task I downloaded the Windows installer and installed it on my computer.  When I tried to open the program, I kept getting the error: Cannot run program "c:\program": CreateProcess error=2, The system cannot find the file specified.  After tinkering with things myself and getting some assistance from the IT staff, here is the workaround we developed to install Haploview on newer Windows operating systems.

1) Download the newest Haploview.jar file and save it to a location where you want to permanently keep it, for example: "C:\Haploview\".  This .jar file contains all the Java code needed to run Haploview.

2) For most 64-bit Windows operating systems, the newest version of Java will usually run Haploview.  The most current version is available for download here.  For 32-bit versions of Windows operating systems, Haploview seems to be a bit pickier about what version of Java will run Haploview.  I have had the best luck with Java version 6 update 43 and earlier.  They can be downloaded from the Oracle Java archive.  Click the radio button to accept the license agreement and select the Windows x86 Offline version for download.  The next page will ask you to sign in or sign up for a free account.  Just complete the form and the download will begin after you log in.

3) Use Notepad (or Notepad++) to create a Haploview.bat file in the same directory you placed the Haploview.jar file with the following code in it.  The Haploview.bat file is simply created by pasting the below code into Notepad and then saving it as Haploview.bat. This is just a quick and easy way to open a command prompt in the background and run the Haploview.jar file in Java.

For 64-bit Windows operating systems use the code:

For 32-bit Windows operating systems use the code:
(Note: The code for the 32-bit Windows operating system explicitly states the version of java to run, i.e. "C:\Program Files\Java\jre1.6.0_43\bin\java.exe". This allows us to avoid having to set up a path variable in Windows (which I have found buggy and difficult to set up).  The "jre1.6.0_43" portion in the path is an example of where your version 6 Java is located.  Lower versions of Java 6 will be "jre1.6.0_##", where ## is the version number.  If you only have one version 6 of Java on your computer, the folder will named "jre6".)

4) Right click on the Haploview.bat file you just created and choose create a shortcut.  This shortcut is now what you can use to open Haploview.  Cut and paste this shortcut into a location that is easy to access.  I put it in my Start Menu under All Programs, but pasting it on the Desktop works well too.

Hope this was helpful and saves those interested in using Haploview on newer 32- and 64-bit Windows operating systems a lot of time.  While these tips should be useful for getting Haploview to run on a majority of newer Windows operating systems, it may still take a bit of trial and error to get Haploview up and running.  If you are still having difficulty getting Haploview to work with newer versions of Java, try older versions from the Oracle Java archive and follow the above instructions for 32-bit Windows operating systems.  Also, check out the comments section below to see what has worked for others.  If something has worked for you and its not posted below, please share!