I previously posted on how to
infer ancestry for a group of study participants using SNP genotypes. Today, I want to visually plot some of the output in R. Two informative plots that can be generated from the output are a standard plot of the two eigenvectors with percent ancestry overlaid and a triangle (or
ternary) plot with each axis representing percentage of one of the three ancestral populations (ex: European, African, and Asian).
Here is some simple code to plot this in R. There is no base package to plot the triangle plot, so the
plotrix package will need to first be
installed. The
ancestry.txt file is the output file from
SNPWEIGHTS, but other output could be formatted to work as well.
The output should look similar to the plots below.
How to filter or select samples based on ancestry.txt?
ReplyDeleteR can filter the dataset (ancestry.txt) based on any of the variables. For example, to create a new dataset with only individuals having >90 percent estimated European ancestry you can use the following one liner: data2 <- data[data$EUR>0.9,]. Hope this helps.
DeleteThanks for reply. Looks like column order in the ancestry file is incorrect.
Deleteit has to be "ID", "Case", "SNPs", "EV1", "EV2", "AFR", "EUR", "ASN". At least my data supports that. what do you say?
I just double checked. The order of the columns is correct in the above blog script with the order being "EUR", "AFR", and "ASN". It looks like there has been an update to SNPWEIGHTS since I last used it, so this may account for the difference. Also, the columns may be assigned in order of the most common ancestry in your input data which could account for the difference.
Delete