Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.

Thursday, December 12, 2013

Create Triangle Plot from Inferred Genetic Ancestry

I previously posted on how to infer ancestry for a group of study participants using SNP genotypes.  Today, I want to visually plot some of the output in R.  Two informative plots that can be generated from the output are a standard plot of the two eigenvectors with percent ancestry overlaid and a triangle (or ternary) plot with each axis representing percentage of one of the three ancestral populations (ex: European, African, and Asian).

Here is some simple code to plot this in R.  There is no base package to plot the triangle plot, so the plotrix package will need to first be installed.  The ancestry.txt file is the output file from SNPWEIGHTS, but other output could be formatted to work as well.

The output should look similar to the plots below.


  1. How to filter or select samples based on ancestry.txt?

    1. R can filter the dataset (ancestry.txt) based on any of the variables. For example, to create a new dataset with only individuals having >90 percent estimated European ancestry you can use the following one liner: data2 <- data[data$EUR>0.9,]. Hope this helps.

    2. Thanks for reply. Looks like column order in the ancestry file is incorrect.
      it has to be "ID", "Case", "SNPs", "EV1", "EV2", "AFR", "EUR", "ASN". At least my data supports that. what do you say?

    3. I just double checked. The order of the columns is correct in the above blog script with the order being "EUR", "AFR", and "ASN". It looks like there has been an update to SNPWEIGHTS since I last used it, so this may account for the difference. Also, the columns may be assigned in order of the most common ancestry in your input data which could account for the difference.