Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Showing posts with label infer. Show all posts
Showing posts with label infer. Show all posts

Thursday, December 12, 2013

Create Triangle Plot from Inferred Genetic Ancestry

I previously posted on how to infer ancestry for a group of study participants using SNP genotypes.  Today, I want to visually plot some of the output in R.  Two informative plots that can be generated from the output are a standard plot of the two eigenvectors with percent ancestry overlaid and a triangle (or ternary) plot with each axis representing percentage of one of the three ancestral populations (ex: European, African, and Asian).

Here is some simple code to plot this in R.  There is no base package to plot the triangle plot, so the plotrix package will need to first be installed.  The ancestry.txt file is the output file from SNPWEIGHTS, but other output could be formatted to work as well.


The output should look similar to the plots below.



Wednesday, July 24, 2013

How to Infer Ancestry from SNP Genotypes

Self-reported ancestry is poor metric to use when attempting to statistically adjust for the effects of ancestry since several individuals falsely report their ancestry or are simply unaware of their true ancestry.  Worse yet, sometimes you don't even have information collected on an individual's ancestry.  As many of you know, if you have SNP genotyping data you can rather precisely estimate the ancestry of an individual.  Classically, to do this one needed to combine genotypes from their study sample with genotypes from a reference panel (eg: HapMap or 1000 Genomes), find the intersection of SNPs in each dataset, and then run a clustering program to see which samples clustered with the reference ancestral populations.  Not a ton of work, but a minor annoyance at best.  Luckily, a relatively new program was just released that, in essence, does a lot of this ground work for you.  It is called SNPWEIGHTS and can be downloaded here.  Essentially, the program takes SNP genotypes as input, finds the intersection of the sample genotypes with reference genotypes, weights them based on pre-configured parameters to construct the first couple of principle components (aka. eigenvectors) and then calculates an individual's percentage ancestry for each ancestral population.  Here is how to run the program.

First, make sure you have Python installed on your system and that your genotyping data is in EIGENSTRAT format.  A brief tutorial to convert to EIGENSTRAT format using the convertf tool is here.

Next, download the SNPWEIGHTS package here and a reference panel. I usually use the European, West African and East Asian ancestral populations, but there are other options on the SNPWEIGHTS webpage as well.

Then, create a parameter file with directories of input files, your input population (designated "AA", "CO", and "EA"), and a output file.  An example of one is below:


Finally, run the program using the command inferancestry.py --par par.SNPWEIGHTS.  For the program to run correctly, make sure the inferancestry.info and snpwt.co files are in the same directory as your inferancestry.py file.

For more details, see the SNPWEIGHTS paper or the README file included in the SNPWEIGHTS zip folder.  For code on generating eigenvector plots with overlaid ancestry percentages and triangle plots according to percent ancestry, see this post.